At the Forge - MongoDB

A look at one of the best-known contenders in the non-relational database space.

Lately I've been teaching programming courses in both Python and Ruby, often to seasoned developers used to C++ and Java. Inevitably, the fact that Python and Ruby are dynamically typed languages, allowing any variable to contain any type of value, catches these students by surprise. They often are shocked to find that a given variable can, at any point in the program, be assigned to contain an integer, a string or an instance of an object, without any constraints. They wonder how it is that anyone could (or would) use such a language, given the possibility for runtime type errors. One of my jobs, as the instructor of this course, is to convince them that it is possible to work in such a language, but that doing so might require more adherence to conventions than they are used to.

So, it's ironic that during the last few months, as I have begun to experiment with non-relational databases, that I have found myself experiencing something akin to my students' shock. My long-standing beliefs about data integrity and what constitutes a reliable database have gone through a bit of a shake-up. I'm still a bit wary of these non-relational (or NoSQL) databases, and I'm far from convinced that the time has come to throw out SQL and the relational model in favor of something that is often easier to work with.

I do think, as I outlined in last month's column, that these databases offer a type of storage and retrieval that often is a more natural fit for many data-storage requirements. And, just as memcached offered an alternative storage system that complemented relational databases rather than replacing them, so too can these non-relational databases perform many useful functions that would be difficult with a relational database.

One of the best-known contenders in the non-relational database space is MongoDB. MongoDB is an open-source project, sponsored by New York-based 10gen (which intends to make money from licensing and support fees). It is written in C++, and there are drivers for all popular modern libraries. The software is licensed under the Affero GNU General Public License, which means if you modify the MongoDB source, and if those modifications are available on a publicly accessible Web site, you must distribute the source to your modifications. This is different from the standard GPL, which does not require that you divulge the source code to server-side applications with which people interact via a browser or other Internet client.

MongoDB has gained a large number of adherents because of its combination of features. It is easy to work with from a variety of languages, is extremely fast (written in C++), is actively supported by both a company and a large community and has proven itself to be stable in many situations and under high-stress conditions. It also includes a number of features for indexing and scaling that make it attractive.

MongoDB, like several of its competitors, describes itself as a document database. This does not mean it is a filesystem meant to store documents, but rather that it replaces the model of tables, rows and columns with that of “documents” consisting of one or more name-value pairs. I find it easier to think of documents as hash tables (or Python dictionaries), in which the keys are strings and the values can be just about anything. Each of these documents exists in a collection, and you can have one or more collections.

In many ways, you can think of MongoDB as an object database, because it allows you to store and retrieve items as objects, rather than force them into two-dimensional tables. However, this object database stores only basic object types—numbers, strings, lists and hashes, for example. Fortunately, these types can store a wide variety of data, flexibly and reliably, so this is not much of a concern.

Downloading and Installing

To download MongoDB, go to, and retrieve the version appropriate for your system. For my server running Ubuntu 8.10, I retrieved the 32-bit version of MongoDB 1.2.2. There is an option to retrieve a statically linked version, but the site itself indicates that this is a fallback, in case the dynamically linked version fails.

After unpacking the MongoDB server, create a directory in which it can store its data. By default, this is /data/db, which you can create with:

mkdir -p /data/db

Start the MongoDB server process with:


Now that you have a server running, you need to create a database. However, this step is unnecessary. If you try to connect to a database that has not yet been defined, MongoDB creates it for you. I tend to do most of my MongoDB work in Ruby, so I downloaded and installed the driver for Ruby from GitHub and started up the interactive Ruby interpreter, irb. Then, I typed:

irb(main):001:0> require 'rubygems'
irb(main):002:0> require 'mongo'

With the MongDB driver loaded, I was able to connect to the already-running server, creating an “atf” database:

irb(main):005:0> db ="atf")

After this, db is an instance of the Mongo::DB class, representing a MongoDB database. Each database may contain any number of collections, analogous to tables in a relational database. By default, this example database contains no collections, as you can see with this small snippet of code:

irb(main):008:0> db.collection_names.each { |name| puts name }
=> [ ]

The return value of an empty list shows that the database is currently empty.

You can create a new collection by invoking the collection method on your database connection:

irb(main):012:0> c = db.collection("stuff")

Once you have created your collection, you also can see that MongoDB has silently created a second collection, named system.indexes, used for indexing the contents:

irb(main):032:0> db.collection_names
=> ["stuff", "system.indexes"]

Because MongoDB is a schema-less database, you can begin to store items to your collection immediately, without defining its columns or data types. In practice, this means you can store hashes with any keys and values that you choose. For example, you can use the insert method to add a new item to your collection:

irb(main):017:0> c.insert({:a => 1, :b => 2})
=> 4b6fe8983c1c7d6a6a000001

The return value is the unique ID for this document (or object) that has just been stored. You can ask the collection to show what you have stored by invoking its find_one method:

irb(main):021:0> c.find_one
=> {"_id"=>4b6fe8983c1c7d6a6a000001, "a"=>1, "b"=>2}

Notice that two things have happened here. First, the keys have been turned from Ruby symbols into strings. Indeed, MongoDB requires that all keys be strings; because symbols are used so pervasively in the Ruby world for hash keys, they are translated into strings silently if you use them.

Second, you can see that another key, named _id, has been added to the document, and its value matches the return value that you received with your first insert.

You can ask the collection to tell how many documents it contains with the count method:

irb(main):026:0> c.count
=> 1

As you might expect, you can store and retrieve data using any number of different languages. Although you are likely to work in a single language, MongoDB (like relational databases) doesn't care what language you use and lets you mix and match them freely.

In the above examples, I used Ruby to store data. I should be able to retrieve this data using Python, as follows:

>>> import pymongo
>>> from pymongo import Connection
>>> connection = Connection()
>>> db = connection.atf
>>> db.collection_names()
   [u'stuff', u'system.indexes']
>>> c = db.stuff

>>> c
   Collection(Database(Connection('localhost', 27017), u'atf'), 

>>> c.find_one()
   {u'a': 1, u'_id': ObjectId('4b6fe8983c1c7d6a6a000001'), u'b': 2}

The only surprises here are probably that the strings are all stored as Unicode, represented with the u'' syntax in Python 2.6 (which I am using here). Also, the document ID, with the key of _id, still is there, but is an object, rather than a string.

You also can see that the MongoDB developers have gone to great efforts to keep the APIs similar across different languages. This means if you work in more than one language, you likely will be able to depend on similar (or identical) method names to perform the same task.


White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState