At the Forge - Cassandra
The home page for Cassandra is cassandra.apache.org. From there, you can download Cassandra and install it on your computer. Because Cassandra is written in Java, there is only one distribution binary, which should work on any computer with a current JVM.
On my computer running Ubuntu, I first installed the latest Java JDK with:
apt-get install openjdk-6-jdk
Following this, I could have downloaded the latest Cassandra version and installed it. But instead, I decided to use apt-get to retrieve the latest version and to ensure that I will receive updates in the future. In order to do this, I first needed to add the appropriate GPG keys to my keychain, as per the instructions on the Cassandra Wiki:
gpg --keyserver wwwkeys.eu.pgp.net --recv-keys F758CE318D77295D gpg --export --armor F758CE318D77295D | sudo apt-key add -
Following that, I added these two lines to /etc/apt/sources.list:
deb http://www.apache.org/dist/cassandra/debian unstable main deb-src http://www.apache.org/dist/cassandra/debian unstable main
Next, I ran apt-get update to retrieve the latest version information for all packages, and then I ran apt-get install cassandra to install it on the server. About a minute later, Cassandra was installed and ready to run on my machine.
I started it up with:
Sure enough, a quick peek at ps showed me that Cassandra indeed was running.
There are numerous interfaces to Cassandra from a variety of programming languages. However, the easiest way to connect to Cassandra often is via its built-in command-line interface (CLI), which comes with the program. Simply enter cassandra-cli in your shell, and you'll see a prompt that looks like this:
Welcome to cassandra CLI. Type 'help' or '?' for help. Type 'quit' or 'exit' to quit. cassandra>
Your first task should be to connect to your local Cassandra server:
cassandra> connect localhost/9160 Connected to: "Test Cluster" on localhost/9160
In case you forgot what was just printed, you can get the current cluster name with:
cassandra> show cluster name Test Cluster
You also can get a list of keyspaces in this cluster:
cassandra> show keyspaces Keyspace1 system
The system keyspace, as you can imagine, is used for Cassandra system tasks. It can be fun and interesting to explore, but you don't want to mess with it unless you really know what you're doing.
What if you want to create a new keyspace? Well, that's where you'll need to go in and change the system's configuration and restart Cassandra. The configuration file you need to modify is called storage-conf.xml. After I installed Cassandra on my Ubuntu system, it was placed in /etc/cassandra/storage-conf.xml. (The filename always will be storage-conf.xml, but the location might differ on your machine, depending on how you installed it.) You can see the contents of this configuration file from the Cassandra CLI, with the command:
cassandra> show config file
However, this command shows only the contents of the file, not its location, so you might have to poke around a bit to find it.
To add a new keyspace to your Cassandra cluster, first you must think about what you want to store and then how you can represent that in Cassandra. As an example, let's store a list of users. You don't need to think beyond that right now; all you need to define is the name of your column family. Individual columns and values can and will be defined on the fly.
To do this, define a new keyspace and one new column family. Each column family is analogous to a table in a relational database; it contains zero or more columns. Each column, in turn, is a name-value pair. Thus, by defining your keyspace as follows, you're basically saying you want to store information about users:
<Keyspace Name="People"> <ColumnFamily Name="Users" CompareWith="BytesType"/> </Keyspace> </Keyspaces>
Like a relational database, you'll be able to store many fields of information about these users. Unlike a relational database, you don't need to define them from the start. Also unlike a relational database, you'll be able to retrieve information about users only via the key you use for this column family. So, if you use e-mail addresses as keys into the “Users” column family, you'll need an address to do something; having the person's first and last name will not do you much good.
Cassandra stores information as a set of bytes; there are no internal types. However, you can (and should) indicate to Cassandra how the data should be sorted. Specifying a “comparator” allows you to simulate the storage of different types. More important, it determines the order in which you will receive results. That's because there is no ORDER BY equivalent in Cassandra when you retrieve data; you need to decide on an order and specify it in the configuration file. Somewhat surprisingly, the ordering is done when the data is written, not when it is read. In the case of the example “Users” column family, you'll just retrieve them in byte order.
If you put the above <Keyspace> section inside the <Keyspaces> tag in your storage-conf.xml file and restart Cassandra, you'll find that it fails to start up. (The error logs are in /var/log/cassandra, at least in my Ubuntu installation.) That's because there are three other definitions you need to include: ReplicaPlacementStrategy, ReplicationFactor and EndPointSnitch. None of these definitions will concern you when you have a single Cassandra node, so I suggest simply copying them from the included Keyspace1 keyspace. In the end, this part of your keyspace definition will look like this:
<Keyspace Name="People"> <ColumnFamily Name="Users" CompareWith="BytesType"/> <ReplicaPlacementStrategy>org.apache.cassandra.locator. ↪RackUnawareStrategy</ReplicaPlacementStrategy> <ReplicationFactor>1</ReplicationFactor> <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch ↪</EndPointSnitch> </Keyspace>
|Dynamic DNS—an Object Lesson in Problem Solving||May 21, 2013|
|Using Salt Stack and Vagrant for Drupal Development||May 20, 2013|
|Making Linux and Android Get Along (It's Not as Hard as It Sounds)||May 16, 2013|
|Drupal Is a Framework: Why Everyone Needs to Understand This||May 15, 2013|
|Home, My Backup Data Center||May 13, 2013|
|Non-Linux FOSS: Seashore||May 10, 2013|
- RSS Feeds
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- Dynamic DNS—an Object Lesson in Problem Solving
- Validate an E-Mail Address with PHP, the Right Way
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Download the Free Red Hat White Paper "Using an Open Source Framework to Catch the Bad Guy"
- Tech Tip: Really Simple HTTP Server with Python
- Please correct the URL for Salt Stack's web site
2 hours 48 min ago
- Android is Linux -- why no better inter-operation
5 hours 3 min ago
- Connecting Android device to desktop Linux via USB
5 hours 32 min ago
- Find new cell phone and tablet pc
6 hours 30 min ago
7 hours 59 min ago
- Automatically updating Guest Additions
9 hours 7 min ago
- I like your topic on android
9 hours 54 min ago
- This is the easiest tutorial
16 hours 29 min ago
- Ahh, the Koolaid.
22 hours 8 min ago
- git-annex assistant
1 day 4 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi
It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?