At the Forge - CouchDB Views

 in
Retrieve your CouchDB data using views and map-reduce functions.
Reduce

So far, the examples here have focused on writing “map” functions, which take the list of documents and outputs a series of key-value pairs. But, as you might imagine, the map-reduce paradigm has two parts, the second of which is called reduce. The idea is that you use map to filter and transform the data into a list, and then use reduce to turn that list into something even more useful, typically a single value. For example, you can define the reduce function as follows:

function(keys, values, rereduce) {
    return 1;
}

From within Futon, this returns a list of three documents, the keys of which are the birthdays, and the values of which is the number 1. This isn't very interesting, to be honest, because you could have accomplished the same thing from within the “map” request. But, if you invoke the same query from curl, you get something else entirely:

{"rows":[
    {"key":null,"value":1}
]}

Why the difference? And, why is there only a single row now?

The answer is that reduce is designed to, well, reduce things. That means instead of returning a result for each row, it returns a single result from all rows. The reduce function actually can be invoked in two ways:

  • The usual way, in which the “keys” and “values” represent document keys and values. In such cases, the “rereduce” parameter is set to false.

  • For rereducing, the “rereduce” parameter is set to true, the keys are null, and the values represent values from a previous, partial run of the “reduce” function.

As the CouchDB Wiki states, this means the reduce function must be both commutative (that is, the order in which arguments are processed doesn't matter) and associative (that is, the order in which operations are executed doesn't matter). This often (but not always) means the reduce function ends up performing addition or multiplication, returning the result of executing the function on all documents. One example of a reduce function that I found calculates the standard deviation from mapped results, so you can find out how similar the documents are along at least one dimension.

Learning to use map-reduce in CouchDB can take some time. However, this paradigm has proven itself for a decade or so. For example, as the backbone for Google's search system, map-reduce has demonstrated excellent performance and flexibility. Granted, Google is using something (or some things) that are not CouchDB, but with a similar interface and paradigm.

Conclusion

If you like the idea of using JavaScript, MVCC, easy replication and JSON documents, CouchDB might be a good choice. It apparently is not as fast as some of its non-SQL competition, such as MongoDB and Cassandra. However, CouchDB's built-in (and sophisticated) Web interface, RESTful communication and the flexibility of map-reduce are all good reasons to use it. CouchDB is so easy to set up and use, why not try it out? Even if you end up not using it, CouchDB is a great way to learn about map-reduce and to try to create some small functions using it.

Reuven M. Lerner is a longtime Web developer, architect and trainer. He is a PhD candidate in learning sciences at Northwestern University, researching the design and analysis of collaborative on-line communities. Reuven lives with his wife and three children in Modi'in, Israel.

______________________

Geek Guide
The DevOps Toolbox

Tools and Technologies for Scale and Reliability
by Linux Journal Editor Bill Childers

Get your free copy today

Sponsored by IBM

Webcast
8 Signs You're Beyond Cron

Scheduling Crontabs With an Enterprise Scheduler
On Demand
Moderated by Linux Journal Contributor Mike Diehl

Sign up now

Sponsored by Skybot