At the Forge - Advanced MongoDB
Last month, I started discussing MongoDB, an open-source non-relational “document-based” database that has been growing in popularity during the past year. Unlike relational databases, which store all information in two-dimensional tables, MongoDB stores everything in something akin to a set of hash tables.
In a relational database, you can be sure that every record (that is, row) in a table has the same number and set of columns. By contrast, MongoDB is schema-less, meaning there is no enforcement of such rules on columns. Two records in a MongoDB collection might have identical keys, or they might have no two keys in common. Ensuring that the keys are meaningful, and that they will not be prone to abuse or error, is the programmer's responsibility.
Working with MongoDB turns out to be fairly straightforward, as I showed in several examples last month. Once you have set up a database and a collection, you can add, remove and modify records using a combination of objects in your favorite language and the MongoDB query language.
The fact that it's easy to work with MongoDB doesn't mean that it's lacking in high-powered features, however. This month, I describe some of the features you're likely to use if you incorporate MongoDB into your applications, such as indexing and object relationships. If you're like me, you'll see there is a lot to like; plus, using MongoDB prods you to think about your data in new and different ways.
As I explained last month, MongoDB has its own query language, allowing you to retrieve records whose attributes match certain conditions. For example, if you have a book database, you might want to find all books with a certain title. One way to perform such a retrieval would be to iterate over each of the records, pulling out all those that precisely match the title in question. In Ruby, you could express this as:
books.find_all {|b| b.title == search_title}
The problem with this approach is that it's quite slow. The system needs to iterate over each of the items, which means as the list of books grows, so too will the time it takes to find what you're seeking.
The solution to this problem, as database programmers have long known, is to use an index. Indexes come in various forms, but the basic idea is that they allow you to find all records with a particular value for the title immediately (or any column field), without having to scan through each of the individual records. It should come as no surprise, then, that MongoDB supports indexes. How can you use them?
Continuing with this book example, I inserted about 43,000 books into a MongoDB collection. Each inserted document was a Ruby hash, storing the book's ISBN, title, weight and publication date. Then, I could retrieve a book using MongoDB's client program, which provides an interactive JavaScript interface:
./bin/mongo atf
> db.books.count()
38202
> db.books.find({isbn:'9789810185060'})
{ "_id" : ObjectId("4b8fca3ef23f3c614600a8c2"),
"title" : "Primary Mathematics 4A Textbook",
"weight" : 40,
"publication_date" : "2003-01-01",
"isbn" : "9789810185060" }
The query certainly seems to execute quickly enough, but if there were millions of records, it would slow down quite a bit. You can give the database server a speed boost by adding an index on the isbn column:
> db.books.ensureIndex({isbn:1})
This creates an index on the isbn column in ascending order. You also could specify -1 (instead of 1) to indicate that the items should be indexed in descending order.
Just as a relational database automatically puts an index on the “primary key” column of a table, MongoDB automatically indexes the unique _id attribute on a collection. Every other index needs to be created manually. And indeed, now if you get a list of the indexes, you will see that not only is the isbn column indexed, but so is _id:
> db.books.getIndexes()
[
{
"name" : "_id_",
"ns" : "atf.books",
"key" : {
"_id" : ObjectId("000000000000000000000000")
}
},
{
"ns" : "atf.books",
"key" : {
"isbn" : 1
},
"name" : "isbn_1"
}
]
Now you can perform the same query as before, requesting all of the books with a particular ISBN. You won't see any change in your result set; however, you should get a response more quickly than before.
You also can create a compound index, which looks at more than one key:
> db.books.ensureIndex({title:1, weight:1})
Perhaps it doesn't make sense to combine the index for a book's title with that of its weight. Nevertheless, that's what I have now done in the example. If you later decide you don't want this index, you can remove it with:
> db.books.dropIndex('title_1_weight_1')
{ "nIndexesWas" : 3, "ok" : 1 }
Because I'm using the JavaScript interface, the response is a JSON object, indicating that there used to be three indexes (and now there are only two), and that the function executed successfully. If you try to drop the index a second time, you'll get an error message:
> db.books.dropIndex('title_1_weight_1')
{ "errmsg" : "index not found", "ok" : 0 }
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Designing Electronics with Linux
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- Build a Skype Server for Your Home Phone System
- Why Python?
- New Products
- A Topic for Discussion - Open Source Feature-Richness?
- Validate an E-Mail Address with PHP, the Right Way
- Tech Tip: Really Simple HTTP Server with Python
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




1 hour 6 min ago
3 hours 36 min ago
13 hours 39 min ago
18 hours 6 min ago
21 hours 41 min ago
22 hours 14 min ago
1 day 37 min ago
1 day 41 min ago
1 day 42 min ago
1 day 5 hours ago