At the Forge - Cassandra
Cassandra provides a non-relational storage and retrieval mechanism (NoSQL database) that features tremendous scalability, speed and flexibility. The inclusion of super columns (and super-column families, which I didn't discuss here) gives you just enough flexibility to store a great deal of information about many users. So long as you never have to search on anything other than the primary key or join information from different users at the database level, Cassandra is a good choice.
That said, Cassandra is significantly harder to understand and administer than other non-relational databases. I think the investment of time and effort are worth it, but you shouldn't expect to be able to work with Cassandra as quickly and easily as with, say, CouchDB or MongoDB. The flip side of this issue is that administration allows you to fine-tune a number of aspects of Cassandra's networking and consistency until you reach a level with which you're comfortable.
Next month, I'll continue exploring and discussing Cassandra, looking at ways to connect multiple Cassandra boxes to a cluster—and what happens when you do so.
The Cassandra home page is at cassandra.apache.org. You might find references to another Cassandra page; it only recently “graduated” to become a full-fledged Apache project, rather than an “incubator” project; thus, some references will be out of date. This page contains download links, documentation, an actively maintained wiki and links to papers, tutorials and drivers in a number of languages.
Cassandra is based on Amazon's Dynamo, the original paper for which is useful in understanding some of the design decisions. You can read this paper at www.allthingsdistributed.com/2007/10/amazons_dynamo.html.
Two complementary video talks describing Cassandra, but aimed more at the network storage aspects (rather than the practical day-to-day usage) are at www.parleys.com/#sl=1&st=5&id=1866 and vimeo.com/5185526.
Finally, although I still find the Cassandra documentation to be a bit lacking, a growing number of blogs, tutorials and testimonials have made their way onto the Web. Three that I particularly enjoyed were Arin Sarkissian's “WTF is a SuperColumn? An Intro to the Cassandra Data Model” (arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model), Evan Weaver's “Up and Running with Cassandra” (blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra) and Dominic Williams' “HBase vs Cassandra: why we moved” (ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved).
Reuven M. Lerner is a longtime Web developer, architect and trainer. He is a PhD candidate in learning sciences at Northwestern University, researching the design and analysis of collaborative on-line communities. Reuven lives with his wife and three children in Modi'in, Israel.
Webinar: 8 Signs You’re Beyond Cron
On Demand NOW
Join Linux Journal and Pat Cameron, Director of Automation Technology at HelpSystems, as they discuss the eight primary advantages of moving beyond cron job scheduling. In this webinar, you’ll learn about integrating cron with an enterprise scheduler.View Now!
|Dr Hjkl on the Command Line||May 21, 2015|
|Initializing and Managing Services in Linux: Past, Present and Future||May 20, 2015|
|Goodbye, Pi. Hello, C.H.I.P.||May 18, 2015|
|Enter to Win Archive DVD + Free Backup Solution||May 18, 2015|
|Using Hiera with Puppet||May 14, 2015|
|Urgent Kernel Patch for Ubuntu||May 12, 2015|
- Dr Hjkl on the Command Line
- Initializing and Managing Services in Linux: Past, Present and Future
- Goodbye, Pi. Hello, C.H.I.P.
- Using Hiera with Puppet
- Enter to Win Archive DVD + Free Backup Solution
- Infinite BusyBox with systemd
- Gartner Dubs DivvyCloud Cool Cloud Management Vendor
- Mumblehard--Let's End Its Five-Year Reign
- It's Easier to Ask Forgiveness...
- A More Stable Future for Ubuntu