At the Forge - Cassandra
The home page for Cassandra is cassandra.apache.org. From there, you can download Cassandra and install it on your computer. Because Cassandra is written in Java, there is only one distribution binary, which should work on any computer with a current JVM.
On my computer running Ubuntu, I first installed the latest Java JDK with:
apt-get install openjdk-6-jdk
Following this, I could have downloaded the latest Cassandra version and installed it. But instead, I decided to use apt-get to retrieve the latest version and to ensure that I will receive updates in the future. In order to do this, I first needed to add the appropriate GPG keys to my keychain, as per the instructions on the Cassandra Wiki:
gpg --keyserver wwwkeys.eu.pgp.net --recv-keys F758CE318D77295D gpg --export --armor F758CE318D77295D | sudo apt-key add -
Following that, I added these two lines to /etc/apt/sources.list:
deb http://www.apache.org/dist/cassandra/debian unstable main deb-src http://www.apache.org/dist/cassandra/debian unstable main
Next, I ran apt-get update to retrieve the latest version information for all packages, and then I ran apt-get install cassandra to install it on the server. About a minute later, Cassandra was installed and ready to run on my machine.
I started it up with:
Sure enough, a quick peek at ps showed me that Cassandra indeed was running.
There are numerous interfaces to Cassandra from a variety of programming languages. However, the easiest way to connect to Cassandra often is via its built-in command-line interface (CLI), which comes with the program. Simply enter cassandra-cli in your shell, and you'll see a prompt that looks like this:
Welcome to cassandra CLI. Type 'help' or '?' for help. Type 'quit' or 'exit' to quit. cassandra>
Your first task should be to connect to your local Cassandra server:
cassandra> connect localhost/9160 Connected to: "Test Cluster" on localhost/9160
In case you forgot what was just printed, you can get the current cluster name with:
cassandra> show cluster name Test Cluster
You also can get a list of keyspaces in this cluster:
cassandra> show keyspaces Keyspace1 system
The system keyspace, as you can imagine, is used for Cassandra system tasks. It can be fun and interesting to explore, but you don't want to mess with it unless you really know what you're doing.
What if you want to create a new keyspace? Well, that's where you'll need to go in and change the system's configuration and restart Cassandra. The configuration file you need to modify is called storage-conf.xml. After I installed Cassandra on my Ubuntu system, it was placed in /etc/cassandra/storage-conf.xml. (The filename always will be storage-conf.xml, but the location might differ on your machine, depending on how you installed it.) You can see the contents of this configuration file from the Cassandra CLI, with the command:
cassandra> show config file
However, this command shows only the contents of the file, not its location, so you might have to poke around a bit to find it.
To add a new keyspace to your Cassandra cluster, first you must think about what you want to store and then how you can represent that in Cassandra. As an example, let's store a list of users. You don't need to think beyond that right now; all you need to define is the name of your column family. Individual columns and values can and will be defined on the fly.
To do this, define a new keyspace and one new column family. Each column family is analogous to a table in a relational database; it contains zero or more columns. Each column, in turn, is a name-value pair. Thus, by defining your keyspace as follows, you're basically saying you want to store information about users:
<Keyspace Name="People"> <ColumnFamily Name="Users" CompareWith="BytesType"/> </Keyspace> </Keyspaces>
Like a relational database, you'll be able to store many fields of information about these users. Unlike a relational database, you don't need to define them from the start. Also unlike a relational database, you'll be able to retrieve information about users only via the key you use for this column family. So, if you use e-mail addresses as keys into the “Users” column family, you'll need an address to do something; having the person's first and last name will not do you much good.
Cassandra stores information as a set of bytes; there are no internal types. However, you can (and should) indicate to Cassandra how the data should be sorted. Specifying a “comparator” allows you to simulate the storage of different types. More important, it determines the order in which you will receive results. That's because there is no ORDER BY equivalent in Cassandra when you retrieve data; you need to decide on an order and specify it in the configuration file. Somewhat surprisingly, the ordering is done when the data is written, not when it is read. In the case of the example “Users” column family, you'll just retrieve them in byte order.
If you put the above <Keyspace> section inside the <Keyspaces> tag in your storage-conf.xml file and restart Cassandra, you'll find that it fails to start up. (The error logs are in /var/log/cassandra, at least in my Ubuntu installation.) That's because there are three other definitions you need to include: ReplicaPlacementStrategy, ReplicationFactor and EndPointSnitch. None of these definitions will concern you when you have a single Cassandra node, so I suggest simply copying them from the included Keyspace1 keyspace. In the end, this part of your keyspace definition will look like this:
<Keyspace Name="People"> <ColumnFamily Name="Users" CompareWith="BytesType"/> <ReplicaPlacementStrategy>org.apache.cassandra.locator. ↪RackUnawareStrategy</ReplicaPlacementStrategy> <ReplicationFactor>1</ReplicationFactor> <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch ↪</EndPointSnitch> </Keyspace>
- High-Availability Storage with HA-LVM
- DNSMasq, the Pint-Sized Super Dæmon!
- Localhost DNS Cache
- Real-Time Rogue Wireless Access Point Detection with the Raspberry Pi
- Days Between Dates: the Counting
- You're the Boss with UBOS
- The Usability of GNOME
- Linux for Astronomers
- Multitenant Sites
- PostgreSQL, the NoSQL Database