At the Forge - Cassandra
Cassandra provides a non-relational storage and retrieval mechanism (NoSQL database) that features tremendous scalability, speed and flexibility. The inclusion of super columns (and super-column families, which I didn't discuss here) gives you just enough flexibility to store a great deal of information about many users. So long as you never have to search on anything other than the primary key or join information from different users at the database level, Cassandra is a good choice.
That said, Cassandra is significantly harder to understand and administer than other non-relational databases. I think the investment of time and effort are worth it, but you shouldn't expect to be able to work with Cassandra as quickly and easily as with, say, CouchDB or MongoDB. The flip side of this issue is that administration allows you to fine-tune a number of aspects of Cassandra's networking and consistency until you reach a level with which you're comfortable.
Next month, I'll continue exploring and discussing Cassandra, looking at ways to connect multiple Cassandra boxes to a cluster—and what happens when you do so.
The Cassandra home page is at cassandra.apache.org. You might find references to another Cassandra page; it only recently “graduated” to become a full-fledged Apache project, rather than an “incubator” project; thus, some references will be out of date. This page contains download links, documentation, an actively maintained wiki and links to papers, tutorials and drivers in a number of languages.
Cassandra is based on Amazon's Dynamo, the original paper for which is useful in understanding some of the design decisions. You can read this paper at www.allthingsdistributed.com/2007/10/amazons_dynamo.html.
Two complementary video talks describing Cassandra, but aimed more at the network storage aspects (rather than the practical day-to-day usage) are at www.parleys.com/#sl=1&st=5&id=1866 and vimeo.com/5185526.
Finally, although I still find the Cassandra documentation to be a bit lacking, a growing number of blogs, tutorials and testimonials have made their way onto the Web. Three that I particularly enjoyed were Arin Sarkissian's “WTF is a SuperColumn? An Intro to the Cassandra Data Model” (arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model), Evan Weaver's “Up and Running with Cassandra” (blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra) and Dominic Williams' “HBase vs Cassandra: why we moved” (ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved).
Reuven M. Lerner is a longtime Web developer, architect and trainer. He is a PhD candidate in learning sciences at Northwestern University, researching the design and analysis of collaborative on-line communities. Reuven lives with his wife and three children in Modi'in, Israel.
Webinar: 8 Signs You’re Beyond Cron
11am CDT, April 29th
Join Linux Journal and Pat Cameron, Director of Automation Technology at HelpSystems, as they discuss the eight primary advantages of moving beyond cron job scheduling. In this webinar, you’ll learn about integrating cron with an enterprise scheduler.Join us!
|Android Candy: Intercoms||Apr 23, 2015|
|"No Reboot" Kernel Patching - And Why You Should Care||Apr 22, 2015|
|Return of the Mac||Apr 20, 2015|
|DevOps: Better Than the Sum of Its Parts||Apr 20, 2015|
|Play for Me, Jarvis||Apr 16, 2015|
|Drupageddon: SQL Injection, Database Abstraction and Hundreds of Thousands of Web Sites||Apr 15, 2015|
- "No Reboot" Kernel Patching - And Why You Should Care
- Tips for Optimizing Linux Memory Usage
- Return of the Mac
- DevOps: Better Than the Sum of Its Parts
- Drupageddon: SQL Injection, Database Abstraction and Hundreds of Thousands of Web Sites
- Consent That Goes Both Ways
- Non-Linux FOSS: .NET?
- New GeekGuide: Beyond Cron
- diff -u: What's New in Kernel Development