SMP and Embedded Real Time

As embedded real-time applications start to run on SMP systems, kernel issues emerge.
Myth 4: Parallel Real-Time Programming Is Impossibly Difficult

Parallel programming might not be mind crushingly hard, but it is certainly harder than single-threaded programming. Real-time programming is also hard. So, why would anyone be crazy enough to take on both at the same time?

It is true that real-time parallel programming poses special challenges, including interactions with lock-induced delays, interrupt handlers and priority inversion. However, Ingo Molnar's -rt patchset provides both kernel and application developers with tools to deal with these challenges. These tools are described in the following sections.

Locking and Real-Time Latency

Much ink has been spilled on locking and real-time latency, but we will stick to the following simple points:

  1. Reducing lock contention improves SMP scalability and reduces real-time latency.

  2. When lock contention is low, there are a finite number of tasks, critical-section execution time is bounded, and locks act in a first-come-first-served manner to the highest-priority tasks, then lock wait times for those tasks will be bounded.

  3. An SMP Linux kernel by its very nature requires very few modifications in order to support the aggressive preemption required by real time.

The first point should be obvious, because spinning on locks is bad for both scalability and latency. For the second point, consider a queue at a bank where each person spends a bounded time T with a solitary teller, there are a bounded number of other people N, and the queue is first-come-first-served. Because there can be at most N people ahead of you, and each can take at most time T, you will wait for at most time NT. Therefore, FIFO priority-based locking really can provide hard real-time latencies.

For the third point, see Figure 5. The left-hand side of the diagram shows three functions A(), B() and C() executing on a pair of CPUs. If functions A() and B() must exclude function C(), some sort of locking scheme must be used. However, that same locking provides the protection needed by the -rt patchset's preemption, as shown on the right-hand side of this diagram. If function B() is preempted, function C() blocks as soon as it tries to acquire the lock, which permits B() to run. After B() completes, C() may acquire the lock and resume running.

Figure 5. SMP Locking and Preemption

This approach requires that kernel spinlocks block, and this change is fundamental to the -rt patchset. In addition, per-CPU variables must be protected more rigorously. Interestingly enough, the -rt patchset also located a number of SMP bugs that had gone undetected.

However, in the standard Linux kernel, interrupt handlers cannot block. But interrupt handlers must acquire locks, which can block in -rt. What can be done?

Interrupt Handlers

Not only are blocking locks a problem for interrupt handlers, but they also can seriously degrade real-time latency, as shown in Figure 6.

Figure 6. Interrupts Degrade Latency

This degradation can be avoided by running the interrupt handler in process context, as shown in Figure 7, which also allows them to acquire blocking locks.

Figure 7. Move Interrupt Handlers to Process Context

Even better, these process-based interrupt handlers can actually be preempted by user-level real-time threads, as shown in Figure 8, where the blue rectangle within the interrupt handler represents a high-priority real-time user process preempting the interrupt handler.

Figure 8. Preempting Interrupt Handlers

Of course, “with great power comes great responsibility.” For example, a high-priority real-time user process could starve interrupts entirely, shutting down all I/O. One way to handle this situation is to provide a low-priority “canary” process. If the “canary” is blocked for longer than a predetermined time, one might kill the offending thread.

Running interrupts in process context permits interrupt handlers to acquire blocking locks, which in turn allows critical sections to be preempted, which permits extremely fast real-time scheduling latencies. In addition, the -rt patchset permits real-time application developers to select the real-time priority at which interrupt handlers run. By running only the most critical portions of the real-time application at higher priority than the interrupt handlers, the developers can minimize the amount of code for which “great responsibility” must be shouldered.

However, preempting critical sections can lead to priority inversion, as described in the next section.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

a question

hpare's picture

I have a question about that "interrupt" discribed in figure 6-8.
Could you tell me if this kind of interrupt happens on one CPU, from cpu catch a INTn do tophalf instructions to deal with the blue rectangle(maybe a softirq() of bottomhalf),do all of these was executed by one CPU?
waiting for your explanation!
thank you!

Threaded interrupts

paulmck's picture

There is a small portion of code that happens in the "top half", or hard irq context. On a non-PREEMPT_RT system he actual interrupt handler code would also execute in hard irq context. However, in PREEMPT_RT, the handler instead executes at process level in a kernel thread executing at real-time priority.

If this handler uses a bottom half, or softirq, then the softirq will be scheduled as another kernel thread, also executing at real-time priority.

The softirq interface is such that the softirq handler executes on the same CPU where the raise_softirq() request ran, Normally the system would be configured so that the hard irq and irq handler ran on the same CPU as well. (I believe that it can be configured otherwise, but I don't know of a good reason to do so.)

Great article, really interesting stuff

Adeel's picture

In addition, there are real-time audio systems, SIP servers and object brokers...

Can you give an example of rt audio/sip/object broker software/projects?

Also, has the -rt patch set had any impact on networking in linux? e.g. latency, iptables traversal time, etc

Would a standard program, e.g. X11, have a performance benefit on -rt compared to a non-rt system?

Examples of RT audio, SIP, object brokers...

paulmck's picture

There are a number of open-source audio projects. Two that come to mind immediately are Jack and Pulse audio, both of which were enthusiastic about testing out the -rt patchset. The only RT SIP servers that I am aware of are proprietary, ditto with object brokers.

There has been some effect of -rt on networking, but many real-time applications use lower-level protocols (such as UDP) or special transports (such as Infiniband) in order to retain greater control over latency. That said, there are special real-time protocols, such as the DDS suite.

Usually, real-time operating systems are designed for responsiveness, and usually give up throughput performance in favor of responsiveness. For one look at this issue, see my recent OLS paper on real time vs. real fast.


szkolenia biznesowe's picture

Nice article

thanks :)

Glad you liked it!

Paul E. McKenney's picture