At the Forge - MongoDB
Lately I've been teaching programming courses in both Python and Ruby, often to seasoned developers used to C++ and Java. Inevitably, the fact that Python and Ruby are dynamically typed languages, allowing any variable to contain any type of value, catches these students by surprise. They often are shocked to find that a given variable can, at any point in the program, be assigned to contain an integer, a string or an instance of an object, without any constraints. They wonder how it is that anyone could (or would) use such a language, given the possibility for runtime type errors. One of my jobs, as the instructor of this course, is to convince them that it is possible to work in such a language, but that doing so might require more adherence to conventions than they are used to.
So, it's ironic that during the last few months, as I have begun to experiment with non-relational databases, that I have found myself experiencing something akin to my students' shock. My long-standing beliefs about data integrity and what constitutes a reliable database have gone through a bit of a shake-up. I'm still a bit wary of these non-relational (or NoSQL) databases, and I'm far from convinced that the time has come to throw out SQL and the relational model in favor of something that is often easier to work with.
I do think, as I outlined in last month's column, that these databases offer a type of storage and retrieval that often is a more natural fit for many data-storage requirements. And, just as memcached offered an alternative storage system that complemented relational databases rather than replacing them, so too can these non-relational databases perform many useful functions that would be difficult with a relational database.
One of the best-known contenders in the non-relational database space is MongoDB. MongoDB is an open-source project, sponsored by New York-based 10gen (which intends to make money from licensing and support fees). It is written in C++, and there are drivers for all popular modern libraries. The software is licensed under the Affero GNU General Public License, which means if you modify the MongoDB source, and if those modifications are available on a publicly accessible Web site, you must distribute the source to your modifications. This is different from the standard GPL, which does not require that you divulge the source code to server-side applications with which people interact via a browser or other Internet client.
MongoDB has gained a large number of adherents because of its combination of features. It is easy to work with from a variety of languages, is extremely fast (written in C++), is actively supported by both a company and a large community and has proven itself to be stable in many situations and under high-stress conditions. It also includes a number of features for indexing and scaling that make it attractive.
MongoDB, like several of its competitors, describes itself as a document database. This does not mean it is a filesystem meant to store documents, but rather that it replaces the model of tables, rows and columns with that of “documents” consisting of one or more name-value pairs. I find it easier to think of documents as hash tables (or Python dictionaries), in which the keys are strings and the values can be just about anything. Each of these documents exists in a collection, and you can have one or more collections.
In many ways, you can think of MongoDB as an object database, because it allows you to store and retrieve items as objects, rather than force them into two-dimensional tables. However, this object database stores only basic object types—numbers, strings, lists and hashes, for example. Fortunately, these types can store a wide variety of data, flexibly and reliably, so this is not much of a concern.
To download MongoDB, go to mongodb.org, and retrieve the version appropriate for your system. For my server running Ubuntu 8.10, I retrieved the 32-bit version of MongoDB 1.2.2. There is an option to retrieve a statically linked version, but the site itself indicates that this is a fallback, in case the dynamically linked version fails.
After unpacking the MongoDB server, create a directory in which it can store its data. By default, this is /data/db, which you can create with:
mkdir -p /data/db
Start the MongoDB server process with:
./bin/mongod
Now that you have a server running, you need to create a database. However, this step is unnecessary. If you try to connect to a database that has not yet been defined, MongoDB creates it for you. I tend to do most of my MongoDB work in Ruby, so I downloaded and installed the driver for Ruby from GitHub and started up the interactive Ruby interpreter, irb. Then, I typed:
irb(main):001:0> require 'rubygems' irb(main):002:0> require 'mongo'
With the MongDB driver loaded, I was able to connect to the already-running server, creating an “atf” database:
irb(main):005:0> db = Mongo::Connection.new.db("atf")
After this, db is an instance of the Mongo::DB class, representing a MongoDB database. Each database may contain any number of collections, analogous to tables in a relational database. By default, this example database contains no collections, as you can see with this small snippet of code:
irb(main):008:0> db.collection_names.each { |name| puts name }
=> [ ]
The return value of an empty list shows that the database is currently empty.
You can create a new collection by invoking the collection method on your database connection:
irb(main):012:0> c = db.collection("stuff")
Once you have created your collection, you also can see that MongoDB has silently created a second collection, named system.indexes, used for indexing the contents:
irb(main):032:0> db.collection_names => ["stuff", "system.indexes"]
Because MongoDB is a schema-less database, you can begin to store items to your collection immediately, without defining its columns or data types. In practice, this means you can store hashes with any keys and values that you choose. For example, you can use the insert method to add a new item to your collection:
irb(main):017:0> c.insert({:a => 1, :b => 2})
=> 4b6fe8983c1c7d6a6a000001
The return value is the unique ID for this document (or object) that has just been stored. You can ask the collection to show what you have stored by invoking its find_one method:
irb(main):021:0> c.find_one
=> {"_id"=>4b6fe8983c1c7d6a6a000001, "a"=>1, "b"=>2}
Notice that two things have happened here. First, the keys have been turned from Ruby symbols into strings. Indeed, MongoDB requires that all keys be strings; because symbols are used so pervasively in the Ruby world for hash keys, they are translated into strings silently if you use them.
Second, you can see that another key, named _id, has been added to the document, and its value matches the return value that you received with your first insert.
You can ask the collection to tell how many documents it contains with the count method:
irb(main):026:0> c.count => 1
As you might expect, you can store and retrieve data using any number of different languages. Although you are likely to work in a single language, MongoDB (like relational databases) doesn't care what language you use and lets you mix and match them freely.
In the above examples, I used Ruby to store data. I should be able to retrieve this data using Python, as follows:
>>> import pymongo
>>> from pymongo import Connection
>>> connection = Connection()
>>> db = connection.atf
>>> db.collection_names()
[u'stuff', u'system.indexes']
>>> c = db.stuff
>>> c
Collection(Database(Connection('localhost', 27017), u'atf'),
↪u'stuff')
>>> c.find_one()
{u'a': 1, u'_id': ObjectId('4b6fe8983c1c7d6a6a000001'), u'b': 2}
The only surprises here are probably that the strings are all stored as Unicode, represented with the u'' syntax in Python 2.6 (which I am using here). Also, the document ID, with the key of _id, still is there, but is an object, rather than a string.
You also can see that the MongoDB developers have gone to great efforts to keep the APIs similar across different languages. This means if you work in more than one language, you likely will be able to depend on similar (or identical) method names to perform the same task.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- RSS Feeds
- What's the tweeting protocol?
- Trying to Tame the Tablet
- Validate an E-Mail Address with PHP, the Right Way
- New Products
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.




36 min 51 sec ago
2 hours 59 min ago
19 hours 47 min ago
22 hours 20 min ago
23 hours 37 min ago
1 day 12 min ago
1 day 34 min ago
1 day 5 hours ago
1 day 6 hours ago
1 day 7 hours ago