At the Forge - CouchDB

Getting started with CouchDB, an increasingly popular non-relational database.
Creating and Populating a Database

Going back to the overview screen, you should see a prompt at the top saying create database. Just as with most relational database systems, a single server may contain more than one database. Each database then may contain any number of documents, each of which has a unique ID and any number of name-value pairs.

So to get started, you need to create a new database. Click on the link, and an AJAX dialog box opens up, asking for the name of the database. I'm going to assume a database name of “atf” for this column, although you might want to choose something closer to your own name or interests. You may use any alphanumeric characters (plus some symbols) for a database name, keeping in mind that a leading underscore is used by internal CouchDB systems, meaning that you should avoid such names for your own work.

After you create a database, you'll be brought to the browse database page. Click on the new document button to create a new document. CouchDB automatically gives the new document a unique ID value (key name “_id”). You may change the ID to one of your liking, if you have a unique numbering or naming scheme that you prefer.

Then, you may add as many name-value pairs as you like, by clicking on the add field button. The name is assumed to be a string, but the value may be any legitimate JSON value—a number, string, array or object. If you enter an array (within square brackets) into the interactive Futon interface, upon completion, it will be represented visually as an array. The same is true with a JSON object. After you enter it, the name-value pairs are displayed in an easy-to-read format.

Once you have added some fields to your document, click the save button.

I added a number of fields to a document describing me. The fields tab in Futon shows me these values in a nice, easy-to-edit format. If I want to see the document in its native JSON, I can click on the source tab and see it there:

{
   "_id": "0534ca63b70beb02d24b62ec4fe72566",
   "_rev": "4-bea8364f4536833c1fd7de5781ea8a08",
   "first_name": "Reuven",
   "last_name": "Lerner",
   "children": [
       "Atara",
       "Shikma",
       "Amotz"
   ]
}

Notice that in addition to the fields I already have mentioned, there is a “_rev” field. That's because when you save a document, the old version does not disappear. Rather, CouchDB keeps the old one around, much as a garbage collector handles memory in high-level languages, such as Ruby and Python. This means there can be multiple documents with the same “_id” field, although only one is considered current—the one with the latest “_rev” field value. The revision contains an integer as well as an MD5 hash value. You normally can look at only the integer to identify the revision, ignoring the hex portion of the string.

Do not mistake the revision tag as a means of keeping backups or for version control. The moment someone compacts a database, all of the old revisions are removed.

As with other non-relational databases, CouchDB allows you to add, remove and rename fields whenever you like. Each document in a database might have its own unique field names, although in practice, this is fairly rare. It is far more common for each document to have a common set of fields, perhaps with some variation in special cases. It is common to say that CouchDB is “schemaless”, but I think it's safer to say that CouchDB (and other NoSQL storage facilities) allows the programmer to decide on the schema at runtime, rather than in advance—much as a dynamic programming language allows you to determine the type of a variable at runtime, rather than at compile time.

One thing that obviously is missing from a JSON-based database is the notion of a foreign key—a pointer from one document, or record, to another. There is no built-in facility for linking one document to another, although there certainly are ways to use information in one document to view another document.

Outside Futon

It's very nice that CouchDB comes with an easy-to-use, browser-based interface. However, this interface is clearly not what you want to be using from your applications. As I wrote above, CouchDB communicates with the outside world using JSON over HTTP. Any action that you just performed via the browser also should be possible via an HTTP client. You could use a library for a programming language; every major language has at least one CouchDB client. But a popular and easy-to-use option is the curl command-line program.

To send a simple GET request to my CouchDB server, I can write:

curl http://atf.lerner.co.il:5984/

And sure enough, I receive the same response as before:

{"couchdb":"Welcome","version":"0.10.0"}

Unfortunately, if something goes wrong, curl won't say much. For that reason, I generally prefer to use the -v option to curl (and most other programs, for that matter), which shows me the HTTP request and response as they take place. It also comes in handy to specify the HTTP verb you want to use (GET, in this case), so I'll do that with the -X option. Thus, I can write:

~$ curl -vX GET http://atf.lerner.co.il:5984/

And I see:


* About to connect() to atf.lerner.co.il port 5984 (#0)
*   Trying 69.55.225.93... connected
* Connected to atf.lerner.co.il (69.55.225.93) port 5984 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.19.4 (universal-apple-darwin10.0) libcurl/7.19.4
> OpenSSL/0.9.8l zlib/1.2.3
> Host: atf.lerner.co.il:5984
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: CouchDB/0.10.0 (Erlang OTP/R13B)
< Date: Mon, 12 Apr 2010 12:03:05 GMT
< Content-Type: text/plain;charset=utf-8
< Content-Length: 41
< Cache-Control: must-revalidate
<
{"couchdb":"Welcome","version":"0.10.0"}
* Connection #0 to host atf.lerner.co.il left intact
* Closing connection #0

You might notice that the “Content-type” response header indicates that what the server sends back is in text/plain format. So, although you might see the content as JSON, CouchDB itself indicates that it's sending plain text. This isn't a big deal, unless you are writing a program that specifically waits for JSON, so you might need to modify its expectations a bit.

You can request your Futon URL as well, using HEAD to avoid the long response:


~$ curl -vX HEAD http://atf.lerner.co.il:5984/_utils/

* About to connect() to atf.lerner.co.il port 5984 (#0)
*   Trying 69.55.225.93... connected
* Connected to atf.lerner.co.il (69.55.225.93) port 5984 (#0)
> HEAD /_utils/ HTTP/1.1
> User-Agent: curl/7.19.4 (universal-apple-darwin10.0) libcurl/7.19.4
> OpenSSL/0.9.8l zlib/1.2.3
> Host: atf.lerner.co.il:5984
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: CouchDB/0.10.0 (Erlang OTP/R13B)
< last-modified: Fri, 23 Oct 2009 12:40:09 GMT
< Date: Mon, 12 Apr 2010 12:04:43 GMT
< Content-Type: text/html
< Content-Length: 3158

In this case, you get a text/HTML response. And, of course, you know that Futon sends HTML for its response, because you already have been using it from a Web browser.

Now, let's try to look at the atf database, which I created earlier, that contains a single document (that is, record). How can I retrieve that information?

Well, I can start by asking for the database (leaving off the -v option now for space reasons):

~$ curl -X GET http://atf.lerner.co.il:5984/atf

{"db_name":"atf","doc_count":1,"doc_del_count":0,"update_seq":4,
 "purge_seq":0,"compact_running":false,"disk_size":16473,
 "instance_start_time":"1271067859057749","disk_format_version":4}

In other words, asking for a database gives basic information about that database, from the number of documents to the amount of space it consumes on the disk.

You can retrieve an individual document by using its ID:

~$ curl -X GET
↪http://atf.lerner.co.il:5984/atf/0534ca63b70beb02d24b62ec4fe72566

{"_id":"0534ca63b70beb02d24b62ec4fe72566",
 "_rev":"4-bea8364f4536833c1fd7de5781ea8a08",
 "first_name":"Reuven",
 "last_name":"Lerner",
 "children":["Atara","Shikma","Amotz"]}

If I want to modify one or more fields in this document, or even add another field, I can do so with a PUT command. curl's -d option lets me specify a document on the command line:

~$ curl -X PUT
↪http://atf.lerner.co.il:5984/atf/0534ca63b70beb02d24b62ec4fe72566
   -d '{"first_name": "Superman", "middle_initial": "M."  }'

{"error":"conflict","reason":"Document update conflict."}

Well, this is surprising. CouchDB is complaining that it cannot perform the update I need, because there is a conflict. Notice that it does not report the error using HTTP codes (such as 500), but rather by sending a JSON object back to me, containing the “error” key.

The reason CouchDB gives an error message here is that I haven't indicated which revision I am attempting to update. Without such a revision indicator, CouchDB assumes I have stale data and, thus, will not allow me to update the document. Only if I send my update with the current “_rev” value will the update succeed. For example:

~$ curl -X PUT
↪http://atf.lerner.co.il:5984/atf/0534ca63b70beb02d24b62ec4fe72566
   -d '{"_rev": "4-bea8364f4536833c1fd7de5781ea8a08",
        "first_name": "Superman", "middle_initial": "M."  }'

CouchDB responds with:

{"ok":true,"id":"0534ca63b70beb02d24b62ec4fe72566","rev":
↪"5-fe6fccb89b9512d26120fbd63dbb15c4"}

In other words, the update succeeded, incrementing the revision. If you try the same update again, you will get the same “update conflict” error message as before, because there can be only one update to a given revision.

Note that when you PUT an update to a document, you must update the entire document at once. Unlike the UPDATE command in a relational database, adding a new revision to a CouchDB document does not modify individual fields. Rather, it stores an entirely new document with the same ID and an incremented revision number. This means in this example, it's true that I have added the “middle_initial” field successfully. However, I also have effectively removed the “children” field, because I did not specify it in my PUT statement.

You can add an entirely new document to your database using the POST verb in HTTP. For example:

~$ curl -X POST http://atf.lerner.co.il:5984/atf
  -d '{"first_name" : "Atara", "last_name" : "Lerner-Friedman"}'

Sure enough, I get the following response, indicating that a new document was created:

{"ok":true,"id":"aeb6925eb23278f1b8e530ba67b0172d",
 "rev":"1-f0e336978a368f679ee7b280107bc2fb"}

I should add that I had a terrible time trying to use curl to create a document, all because of the quotes. It seems that you must use double quotes inside a JSON request (around the names of the keys and values). Single quotes result in a strange error message indicating that the UTF-8 encoding for JSON is invalid, which did not quite point me in the right direction.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

quotes

Jacques's picture

The quotes issue is more shell related than anything to do with curl or json.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix