Linux Clustering with Ruby Queue: Small Is Beautiful
Using posixlock and SQLite made coding a persistent NFS-safe priority queue class relatively straightforward. Of course, there were performance issues to address. A lease-based locking system was added to detect the possible lockd starvation issues I'd heard rumors about on the SQLite mailing list. I posted many questions to the NFS mailing lists during this development stage, and developers such as Trond Myklebust were invaluable resources to me.
I'm not too smart when it comes to guessing the state of programs I myself wrote. Wise programmers know that there is no substitute for good logging. Ruby ships with a built-in Logger class that offers features such as automatic log rolling. Using this class as a foundation, I was able to abstract a small module that's used by all the classes in Ruby Queue to provide consistent, configurable and pervasive logging to all its objects in only a few lines of code. Being able to leverage built-in libraries to abstract important building blocks such as logging is a time- and mind-saver.
If you still are using XML as a data serialization format and yearn for something easier and more readable, I urge you to check out YAML. Ruby Queue uses YAML extensively both as input and output format. For instance, the rq command-line tool shows jobs marked "important" as:
- jid: 1 priority: 0 state: pending submitted: 2004-11-12 15:06:49.514387 started: finished: elapsed: submitter: redfish runner: pid: exit_status: tag: important command: my_job.sh - jid: 2 priority: 42 state: finished submitted: 2004-11-12 17:37:10.312094 started: 2004-11-12 17:37:13.132700 finished: 2004-11-12 17:37:13.739824 elapsed: 0.015724 submitter: redfish runner: bluefish pid: 5477 exit_status: 0 tag: important command: my_high_priority_job.sh
This format is easy for humans to read and friendly to Linux commands such as egrep(1). But best of all, the document above, when used as the input to a command, can be loaded into Ruby as an array of hashes with a single command:
require 'yaml' jobs = YAML::load STDIN
It then can be used as a native Ruby object with no complex API required:
jobs.each do |job| priority = job['priority'] ... end
Perhaps the best summary of YAML for Ruby is offered by it's author, "_why". He writes, "Really, it's quite fantastic. Spreads right on your Rubyware like butter on bread!"
I actually had a prototype of Ruby Queue (rq) in production, a step we do a lot in the DMSP group, when a subtle bug cropped up. NFS has a feature known as silly renaming. This happens when two clients have an NFS file open and one of them removes it, causing the the NFS server to rename the file something like ".nfs123456789" until the second client is done with it and the file truly can be removed.
The general mode of operation for rq, when feeding on a queue (running jobs from it), is to start a transaction on the SQLite database, find a job to run, fork a child process to run the job, update the database with information such as the pid of the job and end the transaction. As it turns out, transactions in SQLite involve some temporary files that are removed at the end of the transaction. The problem was that I was forking in the middle of a transaction, causing the file handle of the temporary file to be open in both the child and the parent. When the parent then removed the temporary file at the end of the transaction, a silly rename occurred so that the child's file handle still was valid. I started seeing dozens of these silly files cluttering my queue directories; they eventually would disappear, but they were ugly and unnerving to users.
I initially looked into the possibility of closing the file handle somehow after forking, but I received some bad news from Dr. Richard Hipp, the creator of SQLite, on the mailing list. He said forking in the middle of a transaction results in "undefined" behavior and was not recommended.
This was bad news, as my design depended heavily on forking in a transaction in order to preserve the atomicity of starting a job and updating its state. What I needed to be able to do was fork without forking. More specifically, I needed another process to fork, run the job and wait for it on my behalf. Now, the idea of setting up a co-process and using IPC to achieve this fork with forking made me break out in hives. Fortunately, Ruby offered a hiveless solution.
DRb, or Distributed Ruby, is a built-in library for working with remote objects. It's similar to Java RMI or SOAP, only DRb is about a million times easier to get going. But, what do remote objects have to do with forking in another process? What I did was code a tiny class that does the forking, job running and waiting for me. An instance of this class then can set up as a local DRb server in a child process. Communication is done transparently by way of UNIX domain sockets. In other words, the DRb server is the co-process that does all the forking and waiting for me. Interacting with this object is similar to interacting with any other Ruby object. The entire JobRunnerDaemon class contains 101 lines of code, including the child process setup. The following are some excerpts from the Feeder class, which shows the key points of its usage.
An instance of a JobRunnerDaemon is started in a child process and a handle on that remote (but on localhost) object is returned:
jrd = JobRunnerDaemon::daemon
A JobRunner object is created for a job, and the JobRunner is created by pre-forking a child in the JobRunnerDaemon's process used later to run the Job. The actual fork takes place in the child process, so it does not affect the parent's transaction:
runner = jrd.runner job pid = runner.pid runner.run
Later, the DRb handle on the JobRunnerDaemon can be used to wait on the child. This blocks exactly as a normal wait would, even though we are waiting on the child of a totally different process.
cid, status = jrd.waitpid2 -1, Process::WUNTRACED
We go through "Run it. Break it. Fix it." cycles like this one often in my group, the philosophy being that there is no test like production. The scientists I work with most closely, Kim Baugh and Jeff Safran, are more than happy to have programs explode in their faces if the end result is better, more reliable code. Programs written in a dynamic language such as Ruby enable me to fix bugs fast, which keeps their enthusiasm for testing high. The combined effect is a rapid evolutionary development cycle.
- Tech Tip: Really Simple HTTP Server with Python
- New Products
- Developing for the Atmel AVR Microcontroller on Linux
- Dialog: An Introductory Tutorial
- Raspberry Pi: the Perfect Home Server
- Building a Linux-Based High-Performance Compute Cluster
- Bash Arrays
- RTLinux Application Development Tutorial
- Recovery of RAID and LVM2 Volumes