Real Time and Linux

What is real time? This article, first of a three-part series, introduces the benchmarks we'll run on realt-ime Linux version in the next two issues.
What to Do?

Consequently, real-time applications usually give themselves a high priority, lock themselves in memory (and don't grow their memory usage), use lock-free communication whenever possible, use cache memory wisely, avoid nondeterministic I/O (e.g., sockets) and execute within a suitably constrained system. Suitable constraints include limiting hardware interrupts, limiting the number of processes, curtailing system call use by other processes and avoiding kernel problem areas, e.g., don't run hdparm.

Some of the system calls that should be made by a real-time application require special privileges. This usually is accomplished by having root be the owner of the process (having a shell owned by root run the program or having the executable file have the SUID bit set). A newer way is to make use of the capability mechanism. There are capabilities for locking down memory, such as CAP_IPC_LOCK (that "IPC'' is in the name is just something we need to accept), and for being able to set real-time priorities, which can be done with the capability CAP_SYS_NICE.

A real-time process sets its priority with sched_setscheduler(2). The current implementation provides the standard POSIX policies of SCHED_FIFO and SCHED_RR, along with priorities ranging in value from 1-99. Bigger is better. The POSIX function to check the maximum allowable priority value for a given policy is sched_get_priority_max(2).

A real-time process should lock down its memory and not grow. Locking memory is done in Linux with the POSIX standard function mlockall(2). Usually one uses the flags value of MCL_CURRENT | MCL_FUTURE to lock down current memory and any new memory if one's process grows in the future. While growing often is not acceptable, if you get lucky and survive the delay you might as well get the newly allocated memory locked down as well. Be careful to grow your stack and allocate all dynamic memory, and then call mlockall(2) before your process begins its time-critical phase. Note that you can check to see if your process had any page faults during a section of code by using getrusage(2). I show a code fragment below to illustrate the use of several functions. Note that one should check the return value from each of these calls and read the man pages for more details:

priority = sched_get_priority_max(SCHED_FIFO);
sp . sched_priority = priority;
sched_setscheduler(getpid(), SCHED_FIFO, &sp);
mlockall(MCL_FUTURE | MCL_CURRENT);
getrusage(RUSAGE_SELF,&ru_before);
    . . .  // R E A L   T I M E      S E C T I O N
getrusage(RUSAGE_SLEF,&ru_after);
minorfaults = ru_after.ru_minflt - ru_before.ru_minflt;
majorfaults = ru_after.ru_majflt - ru_before.ru_majflt;
Benchmarking for Real-Time Applications

There are a number of efforts to benchmark various aspects of Linux. Real-time application developers are most interested in interrupt latency, timer granularity, context-switch time, system call overhead and kernel preemptibility. Interrupt latency is the time from when a device asserts an interrupt until the time that the appropriate interrupt handler begins executing. This typically is delayed by the handling of other interrupts and by interrupts being disabled. Linux does not implement interrupt priorities. Most interrupts are blocked when Linux is handling an interrupt. This time typically is quite short, however, perhaps a few microseconds.

On the other hand, the kernel may block interrupts for a significantly longer time. The intlat program from Andrew Morton allows one to measure interrupt latencies (http://www.uow.edu.au/~andrewm/linux/#intlat/). Similarly, his schedlat shows scheduling latencies (http://www.uow.edu.au/~andrewm/linux/schedlat.html).

Context-switch time is included in the well-known benchmark harness LMbench (http://www.bitmover.com/lmbench/), as well as by others (http://www.atnf.csiro.au/~rgooch/benchmarks/linux-scheduler.html, http://math.nmu.edu/~benchmark/index.php?page=context). LMbench also provides information about system calls.

In Table 1 we show the results of LMbench. This table shows context-switch times. The benchmark program was run three times, and the lowest value for the context-switch time for each configuration is reported in the table as per the documentation for LMbench. The highest value, however, was no more than about 10% larger than the minimum. The size of the process is reported in kilobytes, the context-switch time is in microseconds. The context-switch time data indicate that substantial use of data in the cache causes significantly larger context-switch times. The context-switch time includes time to restore the cache state.

Table 1. Context-Switch Times

As an example of interrupt-off times, one can see some results at http://www.uow.edu.au/~andrewm/linux/intlat/intlat-disk.html. In one experiment with hdparm, the data show that interrupts can be disabled for over 2ms while hdparm runs. Developers can use the intlat mechanism to measure interrupt-off times for the system they are running. It is only under rare conditions that interrupt-off times will exceed 100µs. These conditions should be avoidable for most embedded systems. They are the areas that Morton warns against.

An area of more significant concern to most real-time developers is that of scheduling latency. That is, the delay in continuing a newly awakened high-priority task. A long delay is possible when the kernel is busy executing a system call. This is because the Linux kernel will not preempt a lower priority process in the midst of a system call in order to execute a newly awakened higher priority process. This is why the Linux kernel is termed non-preemptible.

The latency test from Benno Senoner shows that a delay of 100ms or more is possible (http://www.gardena.net/benno/linux/audio/). We can see that both interrupt blocking and scheduling latencies can be sufficiently long to prevent satisfactory performance for some applications.

Timing resolution is also of importance to many embedded Linux developers. For example, the setitimer(2) function is used to set a timer. This function, like other time functions in Linux, has a resolution of 10ms. Thus, if one sets a timer to expire in 15ms, it actually will expire in about 20ms. In a simple test measuring the time interval between 1,000 successive 15ms timers, we found that the average time interval was 19.99ms, the minimum time was 19.987ms and the maximum time was 20.042ms on a quiescent system.

Kevin Dankwardt is founder and CEO of K Computing, a training and consulting firm in Silicon Valley. In particular, his organization develops and delivers embedded and real-time Linux training worldwide.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

realtime linux - is it really deterministic

umesh's picture

we've had terrible experience in particular versions of linux (i'd rather not specify which here), where a 'cp' running on the background for a few GB causes another process running a write() on the same partition to wait 5-20 seconds!. move from CFQ to deadline changed the formula a bit (to about 1.5 seconds to 8 seconds)!. I'm talking 8GB, 2 cpu, 3 GHz, Xeons (HT) running SCSI, and just 2 user processes (1 cp + 1 write()). talk about real-time linux. This is repeatable on any similar config.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix