From my perspective, one of the best parts of being a Web developer is the instant gratification. You write some code, and within minutes, it can be used by people around the world, all accessing your server via a Web browser. The rapidity with which you can go from an idea to development to deployment to actual users benefiting from (and reacting to) your work is, in my experience, highly motivating.
Users also enjoy the speed with which new developments are deployed. In the world of Web applications, users no longer need to consider, download or install the "latest version" of a program; when they load a page into their browser, they automatically get the latest version. Indeed, users have come to expect that new features will be rolled out on a regular basis. A Web application that fails to change and improve over time quickly will lose ground in users' eyes.
Another factor that users consider is the speed with which a Web application responds to their clicks. We are increasingly spoiled by the likes of Amazon and Google, which not only have many thousands of servers at their disposal, but which also tune their applications and servers for the maximum possible response time. We measure the speed of our Web applications in milliseconds, not in seconds, and in just the past few years, we have reached the point when taking even one second to respond to a user is increasingly unacceptable.
The drive to provide ever-greater speed to users has led to a number of techniques that reduce the delays they encounter. One of the simplest is that of a delayed job. Instead of trying to do everything within the span of a single user request, put some of it aside until later.
For example, let's say you are working on a Web application that implements an address book and calendar. If a user asks to see all of his or her appointments for the coming week, you almost certainly could display them right away. But if a user asks to see all appointments during the coming year, it might take some time to retrieve that from the database, format it into HTML and then send it to the user's browser.
Sometimes, you can't divide jobs in this way. For example, let's say that when you add a new appointment to your calendar, you would like the system to send e-mail to each of the participants, indicating that they should add the meeting to their calendars. Sending e-mail doesn't take a long time, but it does require some effort on the part of the server. If you have to send e-mail to a large number of users, the wait might be intolerably long—or just annoyingly long, depending on your users and the task at hand.
Thus, for several years, developers have taken advantage of various "delayed jobs" mechanisms, making it possible to say, "Yes, I want to execute this functionality, but later on, in a separate thread or process from the handling of an HTTP request." Delaying the job in like this may well mean that it'll take longer for the work to be completed. But, no one will mind if the e-mail takes an additional 30 seconds to be sent. Users certainly will mind, by contrast, if it takes an additional 30 seconds to send an HTTP response to the users' browser. And in the world of the Web, users probably will not complain, but rather move on to another site.
This month, I explore the use of delayed jobs, taking a particular look at Sidekiq, a Ruby gem (and accompanying server) written by Mike Perham that provides this functionality using a different approach from some of its predecessors. If you're like me, you'll find that using background jobs is so natural and easy, it quickly becomes part of your day-to-day toolbox for creating Web applications—whether you're sending many e-mail messages, converting files from one format to another or producing large reports that may take time to process, background jobs are a great addition to your arsenal.
Before looking at Sidekiq in particular, let's consider what is necessary for background jobs to work, at least in an object-oriented language like Ruby. The basic idea is that you create a class with a single method, called "perform", that does what you want to execute. For example, you could do something like this:
class MailSender def perform(user) send_mail_to_user(user) end end
Assuming that the
send_mail_to_user method has been
defined in your
system, you can send e-mail with something like:
But here's the thing: you won't ever actually execute that code. Indeed, you won't ever create an instance of MailSender directly. Rather, you'll invoke a class method, something like this:
Notice the difference. Here, the class method takes the parameter that you eventually want to be passed to the "perform" method. But the "perform_async" class method instead stores the request on a queue. At some point in the future, a separate process (or thread) will review method calls that have been stored in the queue, executing them one by one, separately and without any connection to the HTTP requests.
Now, the first place you might consider queuing method classes that you'll want to execute would be in the database. Most modern Web applications use a database of some sort, and that would be a natural first thought. And indeed, in the Ruby world, there have been such gems as "delayed job" and "background job" that do indeed use the database as a queue.
The big problem with this technique, however, is that the queue doesn't need all of the functionality that a database can provide. You can get away with something smaller and lighter, without all the transactional and data-safety features. A second reason not to use the database is to split the load. If your Web application is working hard, you'll probably want to let the database be owned and used by the Web application, without distracting it with your queue.
So, it has become popular to use non-relational databases, aka NoSQL solutions, as queues for background jobs. One particularly popular choice is Redis, the super-fast, packed-with-functionality NoSQL store that works something like a souped-up memcached. The first job queue to use Redis in the Ruby world was Resque, which continues to be popular and effective.
But as applications have grown in size and scope, so too have the requirements for performance. Resque is certainly good enough for most purposes, but Sidekiq attempts to go one better. It also uses Redis as a back-end store, and it even uses the same storage format as Resque, so that you either can share a Redis instance between Resque and Sidekiq or transition from one to the other easily. But, the big difference is that Sidekiq uses threads, whereas Resque uses processes.
Threads? In Ruby?
Threading in Ruby is something of a sore subject. On the one hand, threads in Ruby are super-easy to work with. If you want to execute something in a thread, you just create a new Thread object, handing it a block containing the code you want to execute:
Thread.new do STDERR.puts "Hello!" # runs in a new thread end
The problem is that people who come from languages like Java often are surprised to hear that although Ruby threads are full-fledged system threads, they also have a global interpreter lock (GIL), which prevents more than one thread from executing at a time. This means that if you spawn 20 threads, you will indeed have 20 threads, but the GIL acts as a big mutex, ensuing that no more than one thread is executing simultaneously. Thread execution typically switches for I/O, and given that nearly every program uses I/O on a regular basis, this almost ensures that each thread will be given a chance to execute.
I should note that Ruby isn't the only language with these issues. Python also has a GIL, and Guido van Rossum, Python's creator, has indicated that although he certainly wants Python to support threading, he personally prefers the ease and security of processes. Because processes don't share state, they are less prone to difficult-to-debug problems, without sacrificing too much in execution speed.
Sidekiq is threaded, but it uses a different model for threads than most Rubyists are used to. It uses Celluloid, an "actor-based" threading system that packages the threads inside objects, avoiding most or all of the issues associated with threads. Moreover, Celluloid expects to run in JRuby or Rubinius, two alternative Ruby implementations, which have true threading and lack the GIL. Celluloid-based applications, such as Sidekiq, will work just fine under the standard Ruby interpreter, known as the MRI, but you won't enjoy all of the speed or threading benefits.
Now, let's see how this overview of a delayed job can be implemented in Sidekiq. First and foremost, you'll need to install the Redis NoSQL store. Redis is available from a variety of sources; I was able to install it on my Ubuntu-based machine with:
apt-get install redis # check this
Once Redis is installed, you'll want to install the "sidekiq" gem. Again, it'll give you the best functionality if you run it under JRuby or Rubinius, but you can run it under the standard Ruby interpreter as well. Just realize that the threads will give you non-optimal performance. You can install the gem with:
sudo gem install sidekiq -V
If you're running the Ruby Version Manager (RVM), as I do, you don't want to install the gem as root. Instead, you should just type:
gem install sidekiq -V
(I always like to use the
-V flag, so I can see the details of
the gem as it is installed.)
You can use Sidekiq in any Ruby application. However, most of my work is in Rails, and I imagine you're going to want to use it in Rails, Sinatra or a similar Web application. Thus, let's create a simple Rails application so you can try it:
rails new appointments
Within the new "appointments" directory, you'll then create an "appointment" resource with scaffolding—a combination of model, controller and view that can get you going quickly:
rails g scaffold appointment name:text ↪meeting_at:timestamp notes:text
Once that is done, you have to run the migration, creating the appropriate "appointments" table in your database. In this case, because you didn't specify a database, you'll use SQLite, which is good enough for this toy example.
Now you can fire up your application (
rails s) and go to
/appointments. From that URL, you can create, view, edit and delete
appointments. However, the point is not to create appointments, but rather
delay the execution of something having to do with them. Let's do
something extremely simple, such as sending e-mail:
rails g mailer notifications
Inside app/mailers/notifications.rb, add the following method:
def appointment_info(person, appointment) @person = person @appointment = appointment mail(to:person.email, subject:"Appointment update") end end
And, inside app/views/notifications/appointment_info.html.erb, write the following:
<p>Hello! You have an appointment with <%= @person %> at <%= @appointment.meeting_at %>.</p>
Finally, let's tie it all together, sending your notification, from within your AppointmentWorker class. There's no rule for where the file defining such a class needs to be put, but it seems increasingly standard to have it in app/workers, in part because files under app are all loaded when Rails starts up:
class AppointmentWorker include Sidekiq::Worker def perform(appointment) Notifications.deliver_appointment_info(appointment) end end
Notice several things here. First, the class doesn't inherit from anything special. Sidekiq doesn't use inheritance, but rather has you include a module—a class without instances, in Ruby—whose methods then are available to instances of your class. This is how the perform_async method is defined on you class. Through a little bit of magic, importing a module can define both class and instance methods.
Now all you have to do is change your controller, such that after you create a report, you also send a notification:
Notice that you're not passing the ID of the appointment, but the appointment object itself! Sidekiq uses Ruby's built-in serialization facility to store nearly any sort of object, not just numeric IDs. The object and method call are stored in Redis, until they are retrieved by a Sidekiq process.
Indeed, that's the final part of Sidekiq that you need to get in place: the back-end process that will look through the delayed jobs, running each one in turn as it gets to them. Fortunately, running that is as easy as:
bundle exec sidekiq
Yup, that's all you need to do. True, there are some options you can set, but generally speaking, this starts up a Sidekiq server that looks at the current queue (as stored in Redis), grabs a job off the queue and processes it. You can configure Sidekiq to run with a timeout or with a specified number of retries, and you even can say how many concurrent workers (that is, threads) you want to be working simultaneously.
Remember that although these are indeed threads, Sidekiq (via Celluloid) ensures that they don't have any state in common. Of course, you need to be sure that your worker methods are thread-safe, such that even if a worker gets through 90% of its job and is then halted, it'll be able to restart without any penalties or errors. Thus, your processes must be transactional, just as you would expect from a database query.
There are other ways to schedule Sidekiq jobs, besides defining methods within a module, as in the above example. If there's an existing method that you want to run as a background process, just insert the "delay" method before the actual method call. That is:
If you are using Rails and the built-in ActiveSupport module for easy time descriptions, you even can do something like this:
Sidekiq has become quite popular in the Ruby community since it was released, in no small part because of its high performance, easy installation and ease of use. It also works with commercial hosting services, such as Heroku, assuming that you first install a Redis instance.
Working with delayed jobs changes your perspective of the Web somewhat—you realize that not everything needs to take place immediately. Rather, you can delay certain jobs, putting them in the background, allowing your Web server to respond to users faster than otherwise would be the case. And, when speed is such a crucial element of Web application success, prudent use of Sidekiq likely will make a big difference.
The Sidekiq home page is at http://sidekiq.org. Although Sidekiq.org does point to a commercial version, the basic version is still free and open source, with the source code available on GitHub at http://mperham.github.com/sidekiq, including a Wiki containing a great deal of useful information.
Mike Perham, the author of Sidekiq, describes the actor-based model in a blog post: http://blog.carbonfive.com/2011/04/19/concurrency-with-actors.
Finally, given that Sidekiq uses Redis, you likely will want to read more about this high-performance NoSQL database, at http://redis.io.