At the Forge - CouchDB Views
Last month's column was an initial look at CouchDB, a non-relational, open-source database server, now sponsored by the Apache Software Foundation. CouchDB uses many Web-related standards: data is stored in JSON format, communication takes place using JSON and RESTful resources, and functions are written in JavaScript. CouchDB is not as speedy as some of the other non-relational (NoSQL) databases, such as MongoDB and Cassandra. However, CouchDB is designed to be dependable and easily replicated across multiple servers—a far cry from relational databases, for which replication remains slightly annoying at best.
Last month, I explained how once you have created a CouchDB database, you can use the curl utility to insert, update and remove documents. Each “document” is nothing more than a JSON object, which means it's basically a hash (a Python dictionary), which then may contain arbitrarily nested levels of arrays, hashes and scalar values (that is, strings and numbers). So, I can create a database with an HTTP PUT request:
curl -X PUT http://localhost:5984/atf
Then, I can add some documents to that database with HTTP POST requests:
curl -X POST http://localhost:5984/atf
-d '{"first_name" : "Atara", "middle_name": "Margalit",
"sex":"f", "last_name" : "Lerner-Friedman",
"birthday" : "2000-dec-16"}'
curl -X POST http://localhost:5984/atf
-d '{"first_name" : "Shikma", "middle_name": "Bruria",
"sex":"f", "last_name" : "Lerner-Friedman",
"birthday" : "2002-dec-17"}'
curl -X POST http://localhost:5984/atf
-d '{"first_name" : "Amotz", "middle_name": "David",
"sex":"m", "last_name" : "Lerner-Friedman",
"birthday" : "2005-oct-31"}'
Then, I can check to see that there are three documents, by using an HTTP GET request on the database:
bash-3.2# curl -X GET http://localhost:5984/atf
{"db_name":"rmltest","doc_count":3,
"doc_del_count":0,"update_seq":3,"purge_seq":0,
"compact_running":false,"disk_size":12377,
"instance_start_time":"1273430793169153","disk_format_version":4}
As you can see, the "doc_count" attribute shows that there are, indeed, three documents in this database.
Now, if you have only three documents, querying them doesn't make much sense. But, if you have 300, or even 300,000 documents, you certainly are not going to want to iterate over them just to determine which is the best match and/or most appropriate.
If you were using a relational database server, you would use SQL to retrieve the rows that match a particular set of criteria. Even MongoDB, which I covered earlier this year, offers a query language that is vaguely SQL-like. CouchDB, however, offers a completely different query system, based on JavaScript functions and the map-reduce paradigm. CouchDB's syntax takes some getting used to, especially if you are relatively new at writing JavaScript functions. However, a few small functions can give you a great deal of power, which is (perhaps) the secret behind CouchDB's success.
I've already explained that CouchDB refers to each stored data item as a “document”. CouchDB defined a special kind of document, known as a “design document”, which contains a “view”—JavaScript code that is executed when you want to perform the query. (Design documents also may contain “show” functions, which can sort or otherwise modify the way in which data is displayed, but I won't discuss “show” functions in this column.) If you are developing only a database or applications, you might want to avoid the overhead of a permanent view by creating a temporary view instead. Temporary views take less time to set up and are a bit more flexible, but they execute much more slowly.
So, let's begin by creating a temporary view and some basic JavaScript views. For the data, I'm using the information I entered above, about my three children. (Feel free to substitute information about your own children, if you prefer.) I find it easiest to create temporary views using Futon, the Web-based administrative and maintenance tool that comes with CouchDB. Simply point your Web browser at the server on which CouchDB is running, on port 5984, and go to your database of choice. Then, select temporary view from the pull-down menu in the top right-hand corner.
Your screen now should consist of two parts: on the left side, you have a simple JavaScript function, under the header “map”:
function(doc) {
emit(null, doc);
}
If you ever have used “map” in a language such as Ruby, Python, JavaScript or Lisp, this function already might make sense to you: your function is invoked repeatedly for a list of documents. If it produces a key-value pair, that pair is added to the output from the function running across all documents.
For instance, the example function (which is anonymous) takes a document as an argument, and returns a null key and the document itself as the value. If you click “Run” under the code, you get a set of results: three “null” keys on the left side and the original documents (with their mandatory _id field) on the right.
You can, of course, modify the function such that it outputs only information about girls. To do that, write:
function(doc)
{
if (doc['sex'] == 'f')
{
emit(null, doc);
}
}
Notice how by using a simple if statement, you can eliminate unwanted rows. Now, what if you're interested in getting all the documents, but sorted. CouchDB orders the results by their keys, which means the key you use is useful not only for identifying the resulting documents, but also for ordering them. You could, for example, sort the results by first name:
function(doc) {
emit(doc.first_name, doc);
}
In my case, this means I first get the record for my son (Amotz), followed by Atara, followed by Shikma. Sorting by last name in this particular case doesn't help very much, because they all have identical last names. But, keys can be any data type, which means you even can use an array to arrange items, by last and then first name:
function(doc) {
emit([doc.last_name, doc.first_name], doc);
}
You also can sort them by birthday:
function(doc) {
emit(doc.birthday, doc)
}
However, this will not necessarily have the effect you want. The “birthday” field is a text string, which means the sorting will be done as a string, rather than as a date. (In the case of my children's birthdays, the sorting happens to work out fine, but this is a happy accident, not inherent to CouchDB.)
If you want to create a permanent view, there are a few ways to do so. You can use the Futon (Web-based) interface, and any temporary view can be turned into a permanent one by clicking on save as from the temporary view's screen. But another way, and one that's a bit more flexible if you're writing complex code, is to use curl to PUT a new design document on the server. This document contains JSON, like all other documents in CouchDB, but it has a number of fields that are treated specially by CouchDB. Here is my file, which I called simpleview.json:
{
"_id" : "_design/example",
"views": {
"show_by_birthday": {
"map" : "function(doc){ emit(doc.birthday, doc) }"
}
}
}
Then, I uploaded the contents of this file using curl, as follows:
curl -X PUT http://localhost:5984/atf/_design/simpleview ↪-d @simpleview.json
By using the -d flag and the @ sign, I was able to tell curl to upload the JSON from a file, rather than the command line. I uploaded it to a design document (as you can see from the "_design/" at the beginning of its name), with the view called simpleview. Once uploaded, I then could run it from Futon (by going to the menu item "show_by_birthday"), or by again using curl:
curl -X GET http://localhost:5984/atf/_design/simpleview/ ↪_view/show_by_id
The results of the query are the same no matter what. Futon displays them in a nicer format, but it obviously would be easier for a program to work with the JSON output via HTTP. If you want to edit the view, you can either re-upload it via curl or use Futon to go to the view and edit the JavaScript function.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Designing Electronics with Linux
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- Validate an E-Mail Address with PHP, the Right Way
- Build a Skype Server for Your Home Phone System
- Tech Tip: Really Simple HTTP Server with Python
- Why Python?
- A Topic for Discussion - Open Source Feature-Richness?
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




17 min 25 sec ago
4 hours 19 min ago
8 hours 6 min ago
8 hours 14 min ago
10 hours 29 min ago
12 hours 58 min ago
23 hours 1 min ago
1 day 3 hours ago
1 day 7 hours ago
1 day 7 hours ago