Linux Clustering with Ruby Queue: Small Is Beautiful
Here, I walk though the actual sequence of rq commands used to set up an instant Linux cluster comprised of four nodes. The nodes we are going to use are called onefish, twofish, redfish and bluefish. Each host is identified in its prompt, below. In my home directory on each of the hosts I have the symbolic link ~/nfs pointing at a common NFS directory.
The first thing we have to do is initialize the queue:
redfish:~/nfs > rq queue create created <~/nfs/queue>
Next, we start feeder daemons on all four hosts:
onefish:~/nfs > rq queue feed --daemon -l=~/rq.log twofish:~/nfs > rq queue feed --daemon -l=~/rq.log redfish:~/nfs > rq queue feed --daemon -l=~/rq.log bluefish:~/nfs > rq queue feed --daemon -l=~/rq.log
In practice, you would not want to start feeders by hand on each node, so rq supports being kept alive by way of a crontab entry. When rq runs in daemon mode, it acquires a lockfile that effectively limits it to one feeding process per host, per queue. Starting a feeder daemon simply fails if another daemon already is feeding on the same queue. Thus, a crontab entry like this:
15/* * * * * rq queue feed --daemon --log=log
checks every 15 minutes to see if a daemon is running, and it starts a daemon if and only if one is not running already. In this way, an ordinary user can set up a process that is running at all times, even after a machine reboot.
Jobs can be submitted from the command line, from an input file or, in Linux tradition, from standard input as part of a process pipeline. When using an input file or stdin, the format is either YAML (such as that produced as the output of other can rq commands) or a simple list of jobs, one job per line. The format is auto-detected. Any host that sees the queue can run commands on it:
onefish:~/nfs > cat joblist echo 'job 0' && sleep 0 echo 'job 1' && sleep 1 echo 'job 2' && sleep 2 echo 'job 3' && sleep 3 onefish:~/nfs > cat joblist | rq queue submit - jid: 1 priority: 0 state: pending submitted: 2004-11-12 20:14:13.360397 started: finished: elapsed: submitter: onefish runner: pid: exit_status: tag: command: echo 'job 0' && sleep 0 - jid: 2 priority: 0 state: pending submitted: 2004-11-12 20:14:13.360397 started: finished: elapsed: submitter: onefish runner: pid: exit_status: tag: command: echo 'job 1' && sleep 1 - jid: 3 priority: 0 state: pending submitted: 2004-11-12 20:14:13.360397 started: finished: elapsed: submitter: onefish runner: pid: exit_status: tag: command: echo 'job 2' && sleep 2 - jid: 4 priority: 0 state: pending submitted: 2004-11-12 20:14:13.360397 started: finished: elapsed: submitter: onefish runner: pid: exit_status: tag: command: echo 'job 3' && sleep 3
We see in YAML format, in the output of submitting to the queue, all of the information about each of the jobs. When jobs are complete, all of the fields are filled in. At this point, we check the status of the queue:
redfish:~/nfs > rq queue status --- pending : 2 running : 2 finished : 0 dead : 0
From this, we see that two of the jobs have been picked up by a node and are being run. We can find out which nodes are running our jobs using this input:
onefish:~/nfs > rq queue list running | egrep 'jid|runner' jid: 1 runner: redfish jid: 2 runner: bluefish
The record for a finished jobs remains in the queue until it's deleted, because a user generally would want to collect this information. At this point, we expect all jobs to be complete so we check each one's exit status:
bluefish:~/nfs > rq queue list finished | egrep 'jid|command|exit_status' jid: 1 exit_status: 0 command: echo 'job 0' && sleep 0 jid: 2 exit_status: 0 command: echo 'job 1' && sleep 1 jid: 3 exit_status: 0 command: echo 'job 2' && sleep 2 jid: 4 exit_status: 0 command: echo 'job 3' && sleep 3
All of the commands have finished successfully. We now can delete any successfully completed job from the queue:
twofish:~/nfs > rq queue query exit_status=0 | rq queue delete --- - 1 - 2 - 3 - 4
Ruby Queue can perform quite a few other useful operations. For a complete description, type rq help.
Making the choice to roll your own always is a tough one, because it breaks Programmer's Rule Number 42, which clearly states, "Every problem has been solved. It is Open Source. And it is the first link on Google."
Having a tool such as Ruby is critical when you decide to break Rule Number 42, and the fact that a project such as Ruby Queue can be written in 3,292 lines of code is testament to this fact. With only a few major enhancements planned, it is likely that this code line total will not increase much as the code base is refined and improved. The goals of rq remain simplicity and ease of use.
Ruby Queue set out to lower the barrier scientists had to overcome in order to realize the power of Linux clusters. Providing a simple and easy-to-understand tool that harnesses the power of many CPUs allows them to shift their focus away from the mundane details of complicated distributed computing systems and back to the task of actually doing science. Sometimes small is beautiful.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
| Non-Linux FOSS: libnotify, OS X Style | Jun 18, 2013 |
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
| Introduction to MapReduce with Hadoop on Linux | Jun 05, 2013 |
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Linux Systems Administrator
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Validate an E-Mail Address with PHP, the Right Way
- Technical Support Rep
- Senior Perl Developer
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Introduction to MapReduce with Hadoop on Linux
- Cari Uang
31 min 3 sec ago - user namespaces
3 hours 24 min ago - yea
3 hours 50 min ago - One advantage with VMs
6 hours 18 min ago - about info
6 hours 52 min ago - info
6 hours 53 min ago - info
6 hours 53 min ago - info
6 hours 56 min ago - info
6 hours 57 min ago - abut info
6 hours 58 min ago
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?



Comments
many questions
I have many questions about "Ruby Queue", can I email you directly?
sorry for late reply...
sure!
NFS share a single point of failure
'rq' has no central brain, no communication between nodes, and no scheduler
This sounded like a distributed approach (like P2P), however, there is still a central server that export the NFS share and hence a single point of failure, right? (Just try to understand the idea better.)
[RE] NFS share a single point of failure
yes - exactly right. however, at least in many cases, this is not a drawback per se. the reason is that we already have a strong dependancy on NFS; our scripts and binaries reside there, our config files live there, many static data files live there, and even input/output to programs lives there (though we always work on local copies for performance). we are totally dead in the water without NFS. one of the goals of rq was not to ADD a point of failure. we considered using a RDBMS, for example, in which to store the queue but this adds a point of failure unless you do the (huge) task of setting up a HA db. in essence rq leverages our existing single point of failure. also, as far as single points of failure go NFS is a good one: if mounts are 'hard' processing simply hangs as the server reboots. this applies, of course, to ALL files access including that of the db for rq. because of this we can reboot our NFS server even if 30 nodes are currently using the queue - this behaviour, while it can be coded, is much harder to acheive with a TCP connection to a database. we have tested this many times including a run where we simply pressed the power button on the NFS server and all it's nodes. although i'm sure this could potentially cause problems we've experienced zero through our tests and several real power strip failures. sqlite is not perfect but does a VERY good job at maintaining ACID properties within the confines of the filesystems abilities.
kind regards.
-a
A great tool...
This tools is really great ! I have downloaded all the binaries and I have tested it. All works correctly except when I try to start a second "feeder" computer... I obtain the following message :
process <18182> is already feeding from this queue
What's wrong ? Do you have any idea ?
a great tool
hmmm. this should not happen UNLESS you are trying to start more than one feeding process from a single host. are you attempting to do this on separate hosts and seeing this? i've never seen that but bugs are always possible. contact me offline and we can work out the problem and post the answer back here.
kind regards.
-a
a great tool
so - turns out this a little bugette resulting from two hosts using the same pidfile when (and only when) the home dir itself is NFS mounted. i actually have support to work around this in the code base but the command line switch was taken out for other reasons. i'll add a small fix and make a release later today. the latest rq also has support for automatic job restart if a node reboots and the ability to sumbit jobs to a specfic host (quite useful for debugging). look for release 2.0.0 on the download site this afternoon (MDT).
kind regards.
-a
a great tool
the buggette is fixed and new version (2.0.0) available for download.
cheers.
-a
why not the maildir solution?
I read the article quickly, it's quite interesting.
To my eyes this looks like a replay of the mbox vs maildir debate, with the current article's solution being, "add more complication to the mbox."
Could you add a little blurb as to why one file containing all the jobs data and requiring complex locking is better than one job per file?
one-job-per-file AFAICT would require much, much simpler locking (with a good filehandling protocol/sequence/scheme perhaps no locking).
I hope I've not badly misunderstood the requirements.
mbox vs. maildir approach
i actually considered that approach. the vsdb project uses that idea for nfs safe transactions. the problem with that idea was in implementing ideas like
deleting: will give ESTALE on remote client nfs box if it's using the job when it's deleted.
searching: requires managing a read lock on each file while iterating
updating: requires managing a write lock on each file while updating
having something as powerful as sqlite under the hood made writing this code at LEAST 50 times easier than it would have been without. it's true you could code a basic job running scheme this way, but there are many problems:
who takes which jobs?
how do you coordinate atomically 'taking' a job to run?
i think you'll see that, as soon as you implement useful features on a system like this, you end up either
a) writing nfs transactions yourself (tricky)
b) having a central brain that 'decides' which jobs go where (naming conventions). realize that 'rq' has no central brain, no communication between nodes, and no scheduler. each host simply works as fast as possible to finish the list of jobs. this is possible because taking a job from the queue and starting to run it is an atomic action.
in any case i think you have understood a part of the problem well and i hope this sheds some light.
tuplespaces
>who takes which jobs?>how do you coordinate atomically 'taking' a job to run?TupleSpaces can be used as the basis for this kind of "pull-driven"
set up --- clients pull tuples (jobs) from the tuplespace and leave
behind 'pending' tuples, later they pull the pending tuple and write
back their finished tuple. An admin program hooks up to add new jobs
(tuples), or to read all tuples (or particular kinds of tuples) to
provide status, or to collect finished job-tuples.
tuplespaces
yes - a great idea. this was defintely on my initial list of design ideas. the problem, for us, is that the current security environment on government machines makes ANY sort of networked programming extremely laden with red tape. any tuplespace requires a client/server type architchture which, of course, requires networking. 'rq' is in fact essentially a tuplespace -- it's a single database table containing jobs at tuples ;-)... clients simply pull jobs from it as you suggest. the difference? the networking is handled via NFS - not on top of TCP/UDP etc. in any case, i agree with you that a tuplespace can be a good solution for this sort of problem domain but it would not fly in our shop. the red tape for a 30 node cluster would mean months of time wasted, the NFS model allows a scientist to set up a 30 node cluster SANS sysad in under 30 minutes.
one last thing - if one WERE designing a tuplespace to contain, say, 100000 jobs one would certainly layer it on top of some sort of persistent and transactionally based storage (i hope) and sqlite is a good fit for that. the hitch is, once you've layer your tuplespace server on top of sqlite you don't really need it anymore unless you don't want to go the route of NFS (a possibility). and, of course, if you layer it on top of a network available RDBMS (postgresql for example) you also then don't need a tuplespace any longer.
tuplespaces ARE very attractive for heterogeneous environments and i think a product using that technology (perhaps with sqlite as a backend) would be successful if written. it would share one of the features of rq in that it also would 'auto load-balance' as each client simply took jobs from the queue as fast as possible.
kind regards.
-a
continuing...
sorry to follow up my own post, but i sent prematurely...
in summary:
maildir solves a 'multiple writer single reader' problem - rq solves a (very different) 'multiple writer multiple reader problem.'
cheers.
-a
Great article
Great article, Ara. I only understood 50% of it, but the picture sure is perty.
Easy, but powerful 8-)
Hi A.
This looks easy, like all great ideas. I mean - a computer cannot be faster, than it is built for. So just pull out the tasks - and when the working machine is ready - get the next one.
So when you are running out of proc-time - you just buy another bunch of machines 8-)))))
Marco from: Travel Discount Hotels
Yes, it's true - there are no more lovers left alive,
no one has survived... That's why love has died. PSB
Starting jobs at reboot
"In this way, an ordinary user can set up a process that is running at all times, even after a machine reboot."
Most modern cron(1) also support @reboot which is run just after cron starts.
@reboot
on second thought the @reboot approach is not quite the same: the crontab/lockfile approach i use creates an 'immortal' daemon. eg. the daemon is restarted even if it died (bug) or was killed (accident). using the @reboot method does not ensure the daemon is ALWAYS running. one could argue that a GOOD thing. regardless, they are not quite the same.
cheers.
you learn something everyday
that's a great tip. i'll take it!
cheers.
ruby
Let's declare this "Ruby Queuesday"
Just a small remark - I'm
Just a small remark - I'm using rq-3.4.0 gem and had to change this command:
rq queue feed --daemon -l=~/rq.log
to:
rq queue feed --daemon -l ~/rq.loq
I.e. I had to remove the "=" sign.