At the Forge - Cassandra

Meet the non-relational database that scales to handle even Amazon- and Facebook-size loads.

Cassandra provides a non-relational storage and retrieval mechanism (NoSQL database) that features tremendous scalability, speed and flexibility. The inclusion of super columns (and super-column families, which I didn't discuss here) gives you just enough flexibility to store a great deal of information about many users. So long as you never have to search on anything other than the primary key or join information from different users at the database level, Cassandra is a good choice.

That said, Cassandra is significantly harder to understand and administer than other non-relational databases. I think the investment of time and effort are worth it, but you shouldn't expect to be able to work with Cassandra as quickly and easily as with, say, CouchDB or MongoDB. The flip side of this issue is that administration allows you to fine-tune a number of aspects of Cassandra's networking and consistency until you reach a level with which you're comfortable.

Next month, I'll continue exploring and discussing Cassandra, looking at ways to connect multiple Cassandra boxes to a cluster—and what happens when you do so.

Reuven M. Lerner is a longtime Web developer, architect and trainer. He is a PhD candidate in learning sciences at Northwestern University, researching the design and analysis of collaborative on-line communities. Reuven lives with his wife and three children in Modi'in, Israel.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.


Tyler Hobbs's picture


That was a great informational piece! You make a lot of useful observations that are hard for an insider to notice any more.

I do have a couple of corrections. You say that "All nodes eventually contain all data." This is generally not the case. You set a replication factor (RF) per keyspace which determines how many nodes store a copy of each row (a set of data associated with a key). If RF is less than the number of nodes in your cluster, every node will contain different (but overlapping) sets of data.

Second, although it is true that in Cassandra 0.6 you must restart a node to create a new Column Family or Keyspace, it is no longer true for 0.7 (released yesterday). Keyspace and Column Families may be created, altered, or dropped on a live cluster.