At the Forge - CouchDB Views

 in
Retrieve your CouchDB data using views and map-reduce functions.

Last month's column was an initial look at CouchDB, a non-relational, open-source database server, now sponsored by the Apache Software Foundation. CouchDB uses many Web-related standards: data is stored in JSON format, communication takes place using JSON and RESTful resources, and functions are written in JavaScript. CouchDB is not as speedy as some of the other non-relational (NoSQL) databases, such as MongoDB and Cassandra. However, CouchDB is designed to be dependable and easily replicated across multiple servers—a far cry from relational databases, for which replication remains slightly annoying at best.

Last month, I explained how once you have created a CouchDB database, you can use the curl utility to insert, update and remove documents. Each “document” is nothing more than a JSON object, which means it's basically a hash (a Python dictionary), which then may contain arbitrarily nested levels of arrays, hashes and scalar values (that is, strings and numbers). So, I can create a database with an HTTP PUT request:

curl -X PUT http://localhost:5984/atf

Then, I can add some documents to that database with HTTP POST requests:

curl -X POST http://localhost:5984/atf
     -d '{"first_name" : "Atara", "middle_name": "Margalit", 
          "sex":"f", "last_name" : "Lerner-Friedman", 
          "birthday" : "2000-dec-16"}'

curl -X POST http://localhost:5984/atf
     -d '{"first_name" : "Shikma", "middle_name": "Bruria", 
          "sex":"f", "last_name" : "Lerner-Friedman", 
          "birthday" : "2002-dec-17"}'

curl -X POST http://localhost:5984/atf
     -d '{"first_name" : "Amotz", "middle_name": "David", 
          "sex":"m", "last_name" : "Lerner-Friedman", 
          "birthday" : "2005-oct-31"}'

Then, I can check to see that there are three documents, by using an HTTP GET request on the database:

bash-3.2# curl -X GET http://localhost:5984/atf

   {"db_name":"rmltest","doc_count":3,
    "doc_del_count":0,"update_seq":3,"purge_seq":0,
    "compact_running":false,"disk_size":12377,
    "instance_start_time":"1273430793169153","disk_format_version":4}

As you can see, the "doc_count" attribute shows that there are, indeed, three documents in this database.

Now, if you have only three documents, querying them doesn't make much sense. But, if you have 300, or even 300,000 documents, you certainly are not going to want to iterate over them just to determine which is the best match and/or most appropriate.

If you were using a relational database server, you would use SQL to retrieve the rows that match a particular set of criteria. Even MongoDB, which I covered earlier this year, offers a query language that is vaguely SQL-like. CouchDB, however, offers a completely different query system, based on JavaScript functions and the map-reduce paradigm. CouchDB's syntax takes some getting used to, especially if you are relatively new at writing JavaScript functions. However, a few small functions can give you a great deal of power, which is (perhaps) the secret behind CouchDB's success.

CouchDB Views

I've already explained that CouchDB refers to each stored data item as a “document”. CouchDB defined a special kind of document, known as a “design document”, which contains a “view”—JavaScript code that is executed when you want to perform the query. (Design documents also may contain “show” functions, which can sort or otherwise modify the way in which data is displayed, but I won't discuss “show” functions in this column.) If you are developing only a database or applications, you might want to avoid the overhead of a permanent view by creating a temporary view instead. Temporary views take less time to set up and are a bit more flexible, but they execute much more slowly.

So, let's begin by creating a temporary view and some basic JavaScript views. For the data, I'm using the information I entered above, about my three children. (Feel free to substitute information about your own children, if you prefer.) I find it easiest to create temporary views using Futon, the Web-based administrative and maintenance tool that comes with CouchDB. Simply point your Web browser at the server on which CouchDB is running, on port 5984, and go to your database of choice. Then, select temporary view from the pull-down menu in the top right-hand corner.

Your screen now should consist of two parts: on the left side, you have a simple JavaScript function, under the header “map”:

function(doc) {
  emit(null, doc);
}

If you ever have used “map” in a language such as Ruby, Python, JavaScript or Lisp, this function already might make sense to you: your function is invoked repeatedly for a list of documents. If it produces a key-value pair, that pair is added to the output from the function running across all documents.

For instance, the example function (which is anonymous) takes a document as an argument, and returns a null key and the document itself as the value. If you click “Run” under the code, you get a set of results: three “null” keys on the left side and the original documents (with their mandatory _id field) on the right.

You can, of course, modify the function such that it outputs only information about girls. To do that, write:

function(doc)
{
    if (doc['sex'] == 'f')
    {
        emit(null, doc);
    }
}

Notice how by using a simple if statement, you can eliminate unwanted rows. Now, what if you're interested in getting all the documents, but sorted. CouchDB orders the results by their keys, which means the key you use is useful not only for identifying the resulting documents, but also for ordering them. You could, for example, sort the results by first name:

function(doc) {
        emit(doc.first_name, doc);
}

In my case, this means I first get the record for my son (Amotz), followed by Atara, followed by Shikma. Sorting by last name in this particular case doesn't help very much, because they all have identical last names. But, keys can be any data type, which means you even can use an array to arrange items, by last and then first name:

function(doc) {
        emit([doc.last_name, doc.first_name], doc);
}

You also can sort them by birthday:

function(doc) {
    emit(doc.birthday, doc)
}

However, this will not necessarily have the effect you want. The “birthday” field is a text string, which means the sorting will be done as a string, rather than as a date. (In the case of my children's birthdays, the sorting happens to work out fine, but this is a happy accident, not inherent to CouchDB.)

If you want to create a permanent view, there are a few ways to do so. You can use the Futon (Web-based) interface, and any temporary view can be turned into a permanent one by clicking on save as from the temporary view's screen. But another way, and one that's a bit more flexible if you're writing complex code, is to use curl to PUT a new design document on the server. This document contains JSON, like all other documents in CouchDB, but it has a number of fields that are treated specially by CouchDB. Here is my file, which I called simpleview.json:

{
    "_id" : "_design/example",
    "views": {
        "show_by_birthday": {
            "map" : "function(doc){ emit(doc.birthday, doc) }"
        }
    }
}

Then, I uploaded the contents of this file using curl, as follows:

curl -X PUT http://localhost:5984/atf/_design/simpleview 
 ↪-d @simpleview.json

By using the -d flag and the @ sign, I was able to tell curl to upload the JSON from a file, rather than the command line. I uploaded it to a design document (as you can see from the "_design/" at the beginning of its name), with the view called simpleview. Once uploaded, I then could run it from Futon (by going to the menu item "show_by_birthday"), or by again using curl:

curl -X GET http://localhost:5984/atf/_design/simpleview/
↪_view/show_by_id

The results of the query are the same no matter what. Futon displays them in a nicer format, but it obviously would be easier for a program to work with the JSON output via HTTP. If you want to edit the view, you can either re-upload it via curl or use Futon to go to the view and edit the JavaScript function.

______________________

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState