Introducing Real-Time Linux

While Linux seems a natural solution for many applications, when milliseconds become critical, a robust multitasking environment may be too busy. RT-Linux gets the system under control to meet real-time computing needs.
Why Linux Can't Do Real-Time and Why Simple Fixes Don't Work

Although Linux has system calls for suspending a process for a given time interval, it does not guarantee the process will be resumed as soon as this interval has passed. Depending on system load, the process might be scheduled more than a second later. Furthermore, a user process can be preempted at an unpredictable moment and forced to wait for its share of the CPU time. Assigning the highest priorities to critical tasks does not help, partly because of the Linux “fair” time-sharing scheduling algorithm. This algorithm tries to make sure every user program gets its fair share of computer time. Of course, if we have a real-time task, we want it to get the CPU whenever it needs it, no matter how unfair that may be. Linux virtual memory also contributes to unpredictability. Pages belonging to any user process can be swapped out to disk at any time. Bringing the requested page back to RAM takes an unpredictable amount of time in Linux.

Some of these problems are easily, or somewhat easily, fixed. It's possible to create a new class of special Linux processes which are more real-time. We could change the scheduling algorithm, so that real-time processes are scheduled round-robin or periodically. We could lock a real-time process into memory, so that its pages will never be swapped out. In fact, both ideas are part of the POSIX.1b-1993 specification which defines standards for “real-time” processes. And POSIX.1b-1993 is being incorporated into Linux. In newer versions of Linux, system calls are already provided for locking user pages in memory, making the scheduler policy priority-based and even for more predictable handling of signals.

POSIX.1b-1993 does not solve all our problems. It's not intended to solve the kinds of problems we discussed at the beginning of this article. The standard is aimed at so-called soft real-time programs. A program which displays video in a window is a perfect example of a soft real-time task. We want this task to run quickly and quite often in order to get a good quality display, but a few milliseconds here or there won't make much difference. For hard real-time problems, the POSIX standard has several drawbacks:

  • Linux processes are heavyweight processes associated with significant overhead from process switching. Although Linux is relatively fast in switching processes, it can take several hundred microseconds on a fast machine. This would make it impossible to schedule a task to poll a sensor every 200 microseconds.

  • Linux follows the standard Unix technique of making kernel processes non-preemptive. That is, when a process is making a system call (and running in kernel mode) it cannot be forced to give up the processor to another task, no matter how high the priority of the other task. For people who write operating systems, this is wonderful, because it makes a lot of very complicated synchronization problems disappear. For people who want to run real-time programs it is not so wonderful, since important processes cannot be scheduled while the kernel works on behalf of even the least important process. In kernel mode, it cannot be rescheduled. For example, if Netscape calls fork, the fork will complete before any other process can run.

  • Linux disables interrupts in critical sections of kernel code. This disabling of interrupts means a real-time interrupt can be delayed until the current process, no matter how low its priority, finishes its critical section. Consider this piece of code:

line1: temp = qhead;
line2: qhead = temp->next;

Suppose that before the kernel gets to line 1, qhead contains the address of a data structure that is the only data structure on the queue and that qhead->next contains 0. Now suppose the kernel routine finishes line 1 and computes the value temp->next (which is 0), and then is halted by an interrupt that causes a new element to be added the the queue. When the interrupt routine finishes, qhead->next will not be equal to 0 any more, but when the kernel routine continues it will assign the 0 value to qhead and so will lose the new element. To prevent these types of errors, the Linux kernel makes extensive use of the cli command to clear (disable) interrupts during these critical sections. The kernel routine in this example would disable interrupts before it began changing the queue and re-enable interrupts only when the operation was complete; thus, interrupts would sometimes be delayed. It's hard to calculate the worst possible delay that can be caused by a critical section. You'd have to carefully examine the code for every driver (and much of the rest of the OS as well) to even make a good estimate. We've measured delays of as long as 1/2 millisecond. Consider what such a delay would mean to our camera routine.

Changing the Linux kernel to be a preemptable real-time kernel with low interrupt processing latency would require substantial rewriting of the Linux kernel code—almost writing a new one. Real-time Linux uses a simpler and more efficient solution.

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix