At the Forge - CouchDB Views

 in
Retrieve your CouchDB data using views and map-reduce functions.

Last month's column was an initial look at CouchDB, a non-relational, open-source database server, now sponsored by the Apache Software Foundation. CouchDB uses many Web-related standards: data is stored in JSON format, communication takes place using JSON and RESTful resources, and functions are written in JavaScript. CouchDB is not as speedy as some of the other non-relational (NoSQL) databases, such as MongoDB and Cassandra. However, CouchDB is designed to be dependable and easily replicated across multiple servers—a far cry from relational databases, for which replication remains slightly annoying at best.

Last month, I explained how once you have created a CouchDB database, you can use the curl utility to insert, update and remove documents. Each “document” is nothing more than a JSON object, which means it's basically a hash (a Python dictionary), which then may contain arbitrarily nested levels of arrays, hashes and scalar values (that is, strings and numbers). So, I can create a database with an HTTP PUT request:

curl -X PUT http://localhost:5984/atf

Then, I can add some documents to that database with HTTP POST requests:

curl -X POST http://localhost:5984/atf
     -d '{"first_name" : "Atara", "middle_name": "Margalit", 
          "sex":"f", "last_name" : "Lerner-Friedman", 
          "birthday" : "2000-dec-16"}'

curl -X POST http://localhost:5984/atf
     -d '{"first_name" : "Shikma", "middle_name": "Bruria", 
          "sex":"f", "last_name" : "Lerner-Friedman", 
          "birthday" : "2002-dec-17"}'

curl -X POST http://localhost:5984/atf
     -d '{"first_name" : "Amotz", "middle_name": "David", 
          "sex":"m", "last_name" : "Lerner-Friedman", 
          "birthday" : "2005-oct-31"}'

Then, I can check to see that there are three documents, by using an HTTP GET request on the database:

bash-3.2# curl -X GET http://localhost:5984/atf

   {"db_name":"rmltest","doc_count":3,
    "doc_del_count":0,"update_seq":3,"purge_seq":0,
    "compact_running":false,"disk_size":12377,
    "instance_start_time":"1273430793169153","disk_format_version":4}

As you can see, the "doc_count" attribute shows that there are, indeed, three documents in this database.

Now, if you have only three documents, querying them doesn't make much sense. But, if you have 300, or even 300,000 documents, you certainly are not going to want to iterate over them just to determine which is the best match and/or most appropriate.

If you were using a relational database server, you would use SQL to retrieve the rows that match a particular set of criteria. Even MongoDB, which I covered earlier this year, offers a query language that is vaguely SQL-like. CouchDB, however, offers a completely different query system, based on JavaScript functions and the map-reduce paradigm. CouchDB's syntax takes some getting used to, especially if you are relatively new at writing JavaScript functions. However, a few small functions can give you a great deal of power, which is (perhaps) the secret behind CouchDB's success.

CouchDB Views

I've already explained that CouchDB refers to each stored data item as a “document”. CouchDB defined a special kind of document, known as a “design document”, which contains a “view”—JavaScript code that is executed when you want to perform the query. (Design documents also may contain “show” functions, which can sort or otherwise modify the way in which data is displayed, but I won't discuss “show” functions in this column.) If you are developing only a database or applications, you might want to avoid the overhead of a permanent view by creating a temporary view instead. Temporary views take less time to set up and are a bit more flexible, but they execute much more slowly.

So, let's begin by creating a temporary view and some basic JavaScript views. For the data, I'm using the information I entered above, about my three children. (Feel free to substitute information about your own children, if you prefer.) I find it easiest to create temporary views using Futon, the Web-based administrative and maintenance tool that comes with CouchDB. Simply point your Web browser at the server on which CouchDB is running, on port 5984, and go to your database of choice. Then, select temporary view from the pull-down menu in the top right-hand corner.

Your screen now should consist of two parts: on the left side, you have a simple JavaScript function, under the header “map”:

function(doc) {
  emit(null, doc);
}

If you ever have used “map” in a language such as Ruby, Python, JavaScript or Lisp, this function already might make sense to you: your function is invoked repeatedly for a list of documents. If it produces a key-value pair, that pair is added to the output from the function running across all documents.

For instance, the example function (which is anonymous) takes a document as an argument, and returns a null key and the document itself as the value. If you click “Run” under the code, you get a set of results: three “null” keys on the left side and the original documents (with their mandatory _id field) on the right.

You can, of course, modify the function such that it outputs only information about girls. To do that, write:

function(doc)
{
    if (doc['sex'] == 'f')
    {
        emit(null, doc);
    }
}

Notice how by using a simple if statement, you can eliminate unwanted rows. Now, what if you're interested in getting all the documents, but sorted. CouchDB orders the results by their keys, which means the key you use is useful not only for identifying the resulting documents, but also for ordering them. You could, for example, sort the results by first name:

function(doc) {
        emit(doc.first_name, doc);
}

In my case, this means I first get the record for my son (Amotz), followed by Atara, followed by Shikma. Sorting by last name in this particular case doesn't help very much, because they all have identical last names. But, keys can be any data type, which means you even can use an array to arrange items, by last and then first name:

function(doc) {
        emit([doc.last_name, doc.first_name], doc);
}

You also can sort them by birthday:

function(doc) {
    emit(doc.birthday, doc)
}

However, this will not necessarily have the effect you want. The “birthday” field is a text string, which means the sorting will be done as a string, rather than as a date. (In the case of my children's birthdays, the sorting happens to work out fine, but this is a happy accident, not inherent to CouchDB.)

If you want to create a permanent view, there are a few ways to do so. You can use the Futon (Web-based) interface, and any temporary view can be turned into a permanent one by clicking on save as from the temporary view's screen. But another way, and one that's a bit more flexible if you're writing complex code, is to use curl to PUT a new design document on the server. This document contains JSON, like all other documents in CouchDB, but it has a number of fields that are treated specially by CouchDB. Here is my file, which I called simpleview.json:

{
    "_id" : "_design/example",
    "views": {
        "show_by_birthday": {
            "map" : "function(doc){ emit(doc.birthday, doc) }"
        }
    }
}

Then, I uploaded the contents of this file using curl, as follows:

curl -X PUT http://localhost:5984/atf/_design/simpleview 
 ↪-d @simpleview.json

By using the -d flag and the @ sign, I was able to tell curl to upload the JSON from a file, rather than the command line. I uploaded it to a design document (as you can see from the "_design/" at the beginning of its name), with the view called simpleview. Once uploaded, I then could run it from Futon (by going to the menu item "show_by_birthday"), or by again using curl:

curl -X GET http://localhost:5984/atf/_design/simpleview/
↪_view/show_by_id

The results of the query are the same no matter what. Futon displays them in a nicer format, but it obviously would be easier for a program to work with the JSON output via HTTP. If you want to edit the view, you can either re-upload it via curl or use Futon to go to the view and edit the JavaScript function.

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix