At the Forge - Cassandra
The home page for Cassandra is cassandra.apache.org. From there, you can download Cassandra and install it on your computer. Because Cassandra is written in Java, there is only one distribution binary, which should work on any computer with a current JVM.
On my computer running Ubuntu, I first installed the latest Java JDK with:
apt-get install openjdk-6-jdk
Following this, I could have downloaded the latest Cassandra version and installed it. But instead, I decided to use apt-get to retrieve the latest version and to ensure that I will receive updates in the future. In order to do this, I first needed to add the appropriate GPG keys to my keychain, as per the instructions on the Cassandra Wiki:
gpg --keyserver wwwkeys.eu.pgp.net --recv-keys F758CE318D77295D gpg --export --armor F758CE318D77295D | sudo apt-key add -
Following that, I added these two lines to /etc/apt/sources.list:
deb http://www.apache.org/dist/cassandra/debian unstable main deb-src http://www.apache.org/dist/cassandra/debian unstable main
Next, I ran apt-get update to retrieve the latest version information for all packages, and then I ran apt-get install cassandra to install it on the server. About a minute later, Cassandra was installed and ready to run on my machine.
I started it up with:
Sure enough, a quick peek at ps showed me that Cassandra indeed was running.
There are numerous interfaces to Cassandra from a variety of programming languages. However, the easiest way to connect to Cassandra often is via its built-in command-line interface (CLI), which comes with the program. Simply enter cassandra-cli in your shell, and you'll see a prompt that looks like this:
Welcome to cassandra CLI. Type 'help' or '?' for help. Type 'quit' or 'exit' to quit. cassandra>
Your first task should be to connect to your local Cassandra server:
cassandra> connect localhost/9160 Connected to: "Test Cluster" on localhost/9160
In case you forgot what was just printed, you can get the current cluster name with:
cassandra> show cluster name Test Cluster
You also can get a list of keyspaces in this cluster:
cassandra> show keyspaces Keyspace1 system
The system keyspace, as you can imagine, is used for Cassandra system tasks. It can be fun and interesting to explore, but you don't want to mess with it unless you really know what you're doing.
What if you want to create a new keyspace? Well, that's where you'll need to go in and change the system's configuration and restart Cassandra. The configuration file you need to modify is called storage-conf.xml. After I installed Cassandra on my Ubuntu system, it was placed in /etc/cassandra/storage-conf.xml. (The filename always will be storage-conf.xml, but the location might differ on your machine, depending on how you installed it.) You can see the contents of this configuration file from the Cassandra CLI, with the command:
cassandra> show config file
However, this command shows only the contents of the file, not its location, so you might have to poke around a bit to find it.
To add a new keyspace to your Cassandra cluster, first you must think about what you want to store and then how you can represent that in Cassandra. As an example, let's store a list of users. You don't need to think beyond that right now; all you need to define is the name of your column family. Individual columns and values can and will be defined on the fly.
To do this, define a new keyspace and one new column family. Each column family is analogous to a table in a relational database; it contains zero or more columns. Each column, in turn, is a name-value pair. Thus, by defining your keyspace as follows, you're basically saying you want to store information about users:
<Keyspace Name="People"> <ColumnFamily Name="Users" CompareWith="BytesType"/> </Keyspace> </Keyspaces>
Like a relational database, you'll be able to store many fields of information about these users. Unlike a relational database, you don't need to define them from the start. Also unlike a relational database, you'll be able to retrieve information about users only via the key you use for this column family. So, if you use e-mail addresses as keys into the “Users” column family, you'll need an address to do something; having the person's first and last name will not do you much good.
Cassandra stores information as a set of bytes; there are no internal types. However, you can (and should) indicate to Cassandra how the data should be sorted. Specifying a “comparator” allows you to simulate the storage of different types. More important, it determines the order in which you will receive results. That's because there is no ORDER BY equivalent in Cassandra when you retrieve data; you need to decide on an order and specify it in the configuration file. Somewhat surprisingly, the ordering is done when the data is written, not when it is read. In the case of the example “Users” column family, you'll just retrieve them in byte order.
If you put the above <Keyspace> section inside the <Keyspaces> tag in your storage-conf.xml file and restart Cassandra, you'll find that it fails to start up. (The error logs are in /var/log/cassandra, at least in my Ubuntu installation.) That's because there are three other definitions you need to include: ReplicaPlacementStrategy, ReplicationFactor and EndPointSnitch. None of these definitions will concern you when you have a single Cassandra node, so I suggest simply copying them from the included Keyspace1 keyspace. In the end, this part of your keyspace definition will look like this:
<Keyspace Name="People"> <ColumnFamily Name="Users" CompareWith="BytesType"/> <ReplicaPlacementStrategy>org.apache.cassandra.locator. ↪RackUnawareStrategy</ReplicaPlacementStrategy> <ReplicationFactor>1</ReplicationFactor> <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch ↪</EndPointSnitch> </Keyspace>
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- SUSE LLC's SUSE Manager
- My +1 Sword of Productivity
- Managing Linux Using Puppet
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- Non-Linux FOSS: Caffeine!
- Tech Tip: Really Simple HTTP Server with Python
- Doing for User Space What We Did for Kernel Space
- Rogue Wave Software's Zend Server
- Parsing an RSS News Feed with a Bash Script
- SuperTuxKart 0.9.2 Released
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide