Linux Job Scheduling
Today, in our ongoing series on learning to live with Linux's “inner dæmons”, we are going to look at two dæmons that schedule job execution on Linux. These dæmons are more or less exactly like those found on virtually every UNIX out there. (Linux has separate dæmons for at and cron. Old versions of Linux used a program called “atrun”, which was run in root's crontab once a minute to execute at requests. Some other Unix operating systems have atd functionality directly in crond. This qualifier brought to you by the bureau of auctorial honesty. This article will cover atd and crond as they are distributed with most currently sold distributions, including Debian 2.1, Red Hat, SuSE and Corel, among others.) My test cases were all carried out on a Red Hat 6.1 installation using version 3.1.7 of at. Debian and SuSE versions I currently have are at 3.1.8.
As for cron, most Linux distributions use “Vixie cron” which was originally written, as you might guess, by Paul Vixie. The distributions have each done their own fixes to address a security hole discovered in August 1999. Check your distribution's update page for the most recent version of cron, and make sure you have it installed.
What you think about at and cron will largely depend on what your background is. If you are familiar with only the DOS and Windows world, you should be fairly impressed with what atd and crond offer, even if you have made use of the System Agent, which has certain similarities to crond. If you are an old hand from the world of MIS where you had JCL and various batch environment control systems, you will probably find atd and crond lacking in some essential features. Even so, I hope you will come away from this introduction with a healthy appreciation for what these tools do offer, and perhaps a few ideas about how, even with their limitations, they significantly enhance Linux's capabilities.
People with a mainframe background are very familiar with the concept of job scheduling. They usually use this term interchangably with batch processing. Alas, job scheduling is not batch processing. Batch processing, to my mind at least, includes the concepts of job dependencies, batch process monitoring, checkpoint/restart and recoverability. Neither atd nor crond provides these facilities. If you come from the world of big iron, you may be feeling some disappointment. Don't. As you will see, atd and crond fit in well with the overall UNIX philosophy of simple tools that do one thing well.
If you are coming from a Windows/DOS perspective, you should be pleased by the multi-user nature of atd and crond. Unlike System Agent, you do not have to be logged in for your jobs to be carried out.
If you have a UNIX background, well, you are amongst old friends here.
For those totally unfamiliar with these concepts, what we are talking about is running programs. So what, you say? I log in and type commands and click on little icons. I run programs all day. What's the big deal?
What about having programs run at a certain time of the day, whether you are there or not? What about compiling the latest version of WINE on a busy Linux server when it won't slow down the branch office Intranet? What about that annoying log file the on-line order application spits out that is about to eat up all the free disk space on /usr/prod/orders?
This is where job scheduling comes into play.
There are two kinds of scheduled jobs. You can think of them as “one shot” and “repeating”. One-shot jobs are single executions of programs you want to have take place at some future time, whether or not you are logged in. Repeating jobs are programs you want to have run at certain times or dates, over and over again.
The command you use to schedule one-shot jobs is called “at”. The way to schedule repeating jobs is through a “crontab” (which is a portmanteau word made from CRON TABle, similar to INITtialization TABle and other *nix-y portmanteau words). Oddly enough, the command used to view, edit and store crontabs is called “crontab”.
Unlike some of the other dæmons we have covered in this series, these two have interactive user programs that control them. Because of this, we will cover the basics of using these two dæmons as a non-privileged user (I hope you aren't logging in to your Linux system as root!), then we will go over the dæmons and how they work, then we will cover some fine points of “non-user” or system-scheduled jobs, and finally some of the little “gotchas” that sometimes cause commands to behave differently than you expect when you run them through a scheduler.
The at command is used to schedule one or more programs for a single execution at some later time. There are actually four client commands:
at: Runs commands at specified time
atq: Lists pending commands
atrm: Cancels pending jobs
batch: Runs commands when system load permits
The Linux at command accepts a number of time specifications, considerably extending the POSIX.2 standard. These include:
HH:MMRun at this hour and minute. If this is already passed, the next day is assumed. A 24-hour time is assumed, unless you suffix the time with “am” or “pm”.
now noon midnight teatimeYou read that right. You can type “at teatime”, and Linux's at is civilized enough to know that this is 4 p.m. local time. The “noon” and “midnight” keywords have their normal meaning. The “now” keyword means what it says. It might seem like a dumb thing to have, since if you wanted to run something now, you would type it without the at command, but it has an application in “relative time” invocations. We'll see those after the date modifiers described below.
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?
|Non-Linux FOSS: libnotify, OS X Style||Jun 18, 2013|
|Containers—Not Virtual Machines—Are the Future Cloud||Jun 17, 2013|
|Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer||Jun 12, 2013|
|Weechat, Irssi's Little Brother||Jun 11, 2013|
|One Tail Just Isn't Enough||Jun 07, 2013|
|Introduction to MapReduce with Hadoop on Linux||Jun 05, 2013|
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Linux Systems Administrator
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Validate an E-Mail Address with PHP, the Right Way
- Technical Support Rep
- Senior Perl Developer
- UX Designer
- Introduction to MapReduce with Hadoop on Linux