At the Forge - CouchDB Views

 in
Retrieve your CouchDB data using views and map-reduce functions.
Reduce

So far, the examples here have focused on writing “map” functions, which take the list of documents and outputs a series of key-value pairs. But, as you might imagine, the map-reduce paradigm has two parts, the second of which is called reduce. The idea is that you use map to filter and transform the data into a list, and then use reduce to turn that list into something even more useful, typically a single value. For example, you can define the reduce function as follows:

function(keys, values, rereduce) {
    return 1;
}

From within Futon, this returns a list of three documents, the keys of which are the birthdays, and the values of which is the number 1. This isn't very interesting, to be honest, because you could have accomplished the same thing from within the “map” request. But, if you invoke the same query from curl, you get something else entirely:

{"rows":[
    {"key":null,"value":1}
]}

Why the difference? And, why is there only a single row now?

The answer is that reduce is designed to, well, reduce things. That means instead of returning a result for each row, it returns a single result from all rows. The reduce function actually can be invoked in two ways:

  • The usual way, in which the “keys” and “values” represent document keys and values. In such cases, the “rereduce” parameter is set to false.

  • For rereducing, the “rereduce” parameter is set to true, the keys are null, and the values represent values from a previous, partial run of the “reduce” function.

As the CouchDB Wiki states, this means the reduce function must be both commutative (that is, the order in which arguments are processed doesn't matter) and associative (that is, the order in which operations are executed doesn't matter). This often (but not always) means the reduce function ends up performing addition or multiplication, returning the result of executing the function on all documents. One example of a reduce function that I found calculates the standard deviation from mapped results, so you can find out how similar the documents are along at least one dimension.

Learning to use map-reduce in CouchDB can take some time. However, this paradigm has proven itself for a decade or so. For example, as the backbone for Google's search system, map-reduce has demonstrated excellent performance and flexibility. Granted, Google is using something (or some things) that are not CouchDB, but with a similar interface and paradigm.

Conclusion

If you like the idea of using JavaScript, MVCC, easy replication and JSON documents, CouchDB might be a good choice. It apparently is not as fast as some of its non-SQL competition, such as MongoDB and Cassandra. However, CouchDB's built-in (and sophisticated) Web interface, RESTful communication and the flexibility of map-reduce are all good reasons to use it. CouchDB is so easy to set up and use, why not try it out? Even if you end up not using it, CouchDB is a great way to learn about map-reduce and to try to create some small functions using it.

Reuven M. Lerner is a longtime Web developer, architect and trainer. He is a PhD candidate in learning sciences at Northwestern University, researching the design and analysis of collaborative on-line communities. Reuven lives with his wife and three children in Modi'in, Israel.

______________________

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState