At the Forge - Cassandra
Cassandra provides a non-relational storage and retrieval mechanism (NoSQL database) that features tremendous scalability, speed and flexibility. The inclusion of super columns (and super-column families, which I didn't discuss here) gives you just enough flexibility to store a great deal of information about many users. So long as you never have to search on anything other than the primary key or join information from different users at the database level, Cassandra is a good choice.
That said, Cassandra is significantly harder to understand and administer than other non-relational databases. I think the investment of time and effort are worth it, but you shouldn't expect to be able to work with Cassandra as quickly and easily as with, say, CouchDB or MongoDB. The flip side of this issue is that administration allows you to fine-tune a number of aspects of Cassandra's networking and consistency until you reach a level with which you're comfortable.
Next month, I'll continue exploring and discussing Cassandra, looking at ways to connect multiple Cassandra boxes to a cluster—and what happens when you do so.
The Cassandra home page is at cassandra.apache.org. You might find references to another Cassandra page; it only recently “graduated” to become a full-fledged Apache project, rather than an “incubator” project; thus, some references will be out of date. This page contains download links, documentation, an actively maintained wiki and links to papers, tutorials and drivers in a number of languages.
Cassandra is based on Amazon's Dynamo, the original paper for which is useful in understanding some of the design decisions. You can read this paper at www.allthingsdistributed.com/2007/10/amazons_dynamo.html.
Two complementary video talks describing Cassandra, but aimed more at the network storage aspects (rather than the practical day-to-day usage) are at www.parleys.com/#sl=1&st=5&id=1866 and vimeo.com/5185526.
Finally, although I still find the Cassandra documentation to be a bit lacking, a growing number of blogs, tutorials and testimonials have made their way onto the Web. Three that I particularly enjoyed were Arin Sarkissian's “WTF is a SuperColumn? An Intro to the Cassandra Data Model” (arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model), Evan Weaver's “Up and Running with Cassandra” (blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra) and Dominic Williams' “HBase vs Cassandra: why we moved” (ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved).
Reuven M. Lerner is a longtime Web developer, architect and trainer. He is a PhD candidate in learning sciences at Northwestern University, researching the design and analysis of collaborative on-line communities. Reuven lives with his wife and three children in Modi'in, Israel.
|Red Hat Enterprise Linux 7.1 beta available on IBM Power Platform||Jan 23, 2015|
|Designing with Linux||Jan 22, 2015|
|Wondershaper—QOS in a Pinch||Jan 21, 2015|
|Ideal Backups with zbackup||Jan 19, 2015|
|Non-Linux FOSS: Animation Made Easy||Jan 14, 2015|
|Internet of Things Blows Away CES, and it May Be Hunting for YOU Next||Jan 12, 2015|
- Designing with Linux
- Wondershaper—QOS in a Pinch
- Red Hat Enterprise Linux 7.1 beta available on IBM Power Platform
- Internet of Things Blows Away CES, and it May Be Hunting for YOU Next
- Ideal Backups with zbackup
- Slow System? iotop Is Your Friend
- New Products
- 2014 Book Roundup
- Hats Off to Mozilla
- January 2015 Issue of Linux Journal: Security
Editorial Advisory Panel
Thank you to our 2014 Editorial Advisors!
- Jeff Parent
- Brad Baillio
- Nick Baronian
- Steve Case
- Chadalavada Kalyana
- Caleb Cullen
- Keir Davis
- Michael Eager
- Nick Faltys
- Dennis Frey
- Philip Jacob
- Jay Kruizenga
- Steve Marquez
- Dave McAllister
- Craig Oda
- Mike Roberts
- Chris Stark
- Patrick Swartz
- David Lynch
- Alicia Gibb
- Thomas Quinlan
- Carson McDonald
- Kristen Shoemaker
- Charnell Luchich
- James Walker
- Victor Gregorio
- Hari Boukis
- Brian Conner
- David Lane