At the Forge - Redis
The past few months, I've been covering non-relational databases, sometimes known as NoSQL databases. To hard-core NoSQL proponents, relational databases are no longer the be-all and end-all of data storage. Rather, NoSQL systems, which offer flexibility, easy replication and storage using modern data structures, are the way of the future—and perhaps even of the present.
Most NoSQL adherents aren't quite this extreme, but instead point to NoSQL as a useful solution to relatively new problems, such as those faced by Web sites with massive loads. To them (and me), NoSQL databases offer the storage equivalent of a new data structure. You could build programs with nothing more than integers, strings and arrays, but with the addition of hash tables to your arsenal, your code becomes easier to write and maintain. In the same way, having an additional storage mechanism can improve the quality, efficiency and maintainability of your software.
NoSQL is a catchphrase that has caught on like wildfire in the past year or two, but it's a problematic phrase in that it describes what these databases are not, rather than what they are. And indeed, many different types of NoSQL databases exist. Two that I have explored in this column during the past few months are MongoDB and CouchDB. Both of these are “document” databases—they store collections of name-value pairs, much like a Ruby hash or a Python dictionary.
A different type of NoSQL database is the key-value store. Whereas you can think of a document database as containing multiple hash tables, a key-value store is the equivalent of a single hash table. As you can tell by its name, a key-value store allows for the storage of a single value (which might be an aggregate data structure, such as an array or hash table), identified by a single key.
Whether a document database or a key-value store is more appropriate for your application depends greatly on your needs. I recently rewrote part of my PhD dissertation software, which previously had used PostgreSQL for all back-end storage, to use a combination of PostgreSQL and MongoDB. I chose MongoDB because I will need to retrieve documents using a variety of fields and combinations of fields. A single key for each document would have been insufficient.
In another case, a financial application on which I have been working, I needed fast access to the latest exchange rates for a number of currency pairs. Because I was going to be retrieving data based only on a single, unique key (that is, the six-letter representation of a currency pair), using a document database would result in unnecessary overhead. All I was interested in doing was storing the current exchange rate for a currency pair or retrieving the current rate for that pair, a perfect match for a key-value store.
So, I spent some time investigating key-value stores and decided to use Redis, an open-source key-value store originally developed by Salvatore Sanfilippo, an Italian programmer who was hired by VMware to work on Redis full-time. Redis was released in February 2009, but it quickly has attracted a large following, in no small part because of its amazing speed.
In many ways, Redis resembles memcached, another key-value store that is popular for scaling Web applications. Like memcached, Redis stores keys and values in RAM. Like memcached, Redis is extremely fast. Like memcached, Redis has bindings and clients written in a large number of languages.
However, there are significant differences. Redis can store and manipulate a large number of data structures (such as lists, sets and hashes). Redis stores values in RAM but writes them out to disk, asynchronously, on a regular basis. This means if someone pulls the plug on your computer, you will lose only the items you added since the last time Redis saved. Everything else will be read into RAM and made available in the usual way when you next bring up Redis.
And, have I mentioned that Redis is fast? It's not uncommon to hear people talk about getting tens of thousands of reads and writes per second with Redis.
Now that I have described Redis, let's try to install it. On most modern Linux distributions, you should be able to install Redis (often as the package redis-server) via apt-get or yum. However, pay attention to the version number. My Linux server running Ubuntu 9.10 happily installed a very old version of Redis for me. I uninstalled it and downloaded it from the Redis home page (see Resources).
If you download the source code, you might be surprised to discover that there is no configure script. Rather, you just run make to compile Redis. Once it's done, you can install the programs (especially redis-server) manually into an appropriate directory, such as /usr/local/bin. Don't forget to install redis.conf, the Redis configuration file, in an appropriate place, such as /etc. To get things started, say:
This tells Redis to start up and read its configuration from /etc/redis.conf. The configuration file is easy to read and modify, and you should take a look at it when you have a chance. If you're interested in just starting to work with Redis and don't care about fiddling with the controls, you can do that. The default configuration works just fine for most basic purposes.
The configuration setting that probably is of greatest interest is “daemonize”, which indicates whether Redis should put itself into the background. I kept Redis in the foreground (and with debug-level logging active) when I first started to use it, but when I finally put it into production, I turned on daemonize, so I wouldn't receive a large number of log and update messages while the system was in use.
The other configuration setting indicates how often Redis should save its state to disk. The default configuration parameters that came with my installation look like this:
save 900 1 save 300 10 save 60 10000
This means Redis should save its state every 900 seconds if there has been one change, every 300 seconds if there have been ten changes, and every 60 seconds if there have been 10,000 changes. Redis saves to disk asynchronously, so there's no danger of it slowing down substantially when it performs the save operation.
You can change these settings according to your particular application's needs, striking an appropriate balance between how much data you're willing to lose if the server goes down and the need for high performance. A separate program, redis-benchmark, comes with Redis, and it allows you to get a sense of how many reads and writes you can expect to execute per second on your specific hardware, with the configuration options you have put in place.
By default, Redis listens on port 6379. You can connect to it locally via telnet or by using the redis-cli program that comes along with it, which lets you interact with the Redis server.
- DNSMasq, the Pint-Sized Super Dæmon!
- Localhost DNS Cache
- Real-Time Rogue Wireless Access Point Detection with the Raspberry Pi
- Days Between Dates: the Counting
- You're the Boss with UBOS
- The Usability of GNOME
- Linux for Astronomers
- Multitenant Sites
- High-Availability Storage with HA-LVM
- Many Drives, One Folder