SMP and Embedded Real Time

As embedded real-time applications start to run on SMP systems, kernel issues emerge.
Priority Inversion

Priority inversion is illustrated by Figure 9. A low-priority process P2 holds a lock, but is preempted by medium-priority processes. When high-priority process P1 tries to acquire the lock, it must wait, because P2 holds it. But P2 cannot release it until it runs again, which will not happen while the medium-priority processes are running. So, in effect, the medium-priority processes have blocked a high-priority process: in short, priority inversion.

Figure 9. Priority Inversion

One way to prevent priority inversion is to disable preemption during critical sections, as is done in CONFIG_PREEMPT builds of the Linux kernel. However, this preemption disabling can result in excessive latencies.

The -rt patchset therefore uses priority inheritance instead, so that P1 donates its priority to P2, but only for as long as P2 continues to hold the lock, as shown in Figure 10. Because P2 is now running at high priority, it preempts the medium-priority processes, completing its critical section quickly and then handing the lock off to P1.

Figure 10. Priority Inheritance

So priority inheritance works well for exclusive locks, where only one thread can hold the lock at a given time. But there are also reader-writer locks, which can be held by one writer on the one hand or by an unlimited number of readers on the other. The fact that a reader-writer lock can be held by an unlimited number of readers can be a real problem for priority inheritance, as illustrated in Figure 11. Here, several low-priority processes are read-holding lock L1, but are preempted by medium-priority processes. Each low-priority process might also be blocked write-acquiring other locks, which might be read-held by even more low-priority processes, all of which are also preempted by the medium-priority processes.

Figure 11. Reader-Writer Lock Priority Inversion

Priority inheritance can solve this problem, but the cure is worse than the disease. For example, the arbitrarily bushy tree of preempted processes requires complex and slow bookkeeping. But even worse, before the high-priority writer can proceed, all of the low-priority processes must complete their critical sections, which will result in arbitrarily long delays.

Such delays are not what we want in a real-time system. This situation resulted in numerous “spirited” discussions on the Linux-kernel mailing list, which Ingo Molnar closed down with the following proposal:

  1. Only one task at a time may read-acquire a given reader-writer lock.

  2. If #1 results in performance or scalability problems, the problematic lock will be replaced with RCU (read-copy update).

RCU can be thought of as a reader-writer lock where readers never block; in fact, readers execute a deterministic number of instructions. Writers have much higher overhead, as they must retain old versions of the data structure that readers might still be referencing. RCU provides special primitives to allow writers to determine when all readers have completed, so that the writer can determine when it is safe to free old versions. RCU works best for read-mostly data structures, or for data structures with hard real-time readers. (More detail may be found at, and even more detail may be found at Although user-level RCU implementations have been produced for experimental purposes, for example,, production-quality RCU implementations are currently found only in kernels. Fixing this is on my to-do list.)

A key property of RCU is that writers never block readers, and, conversely, readers do not block writers from modifying a data structure. Therefore, RCU cannot cause priority inversion. This is illustrated in Figure 12. Here, the low-priority processes are in RCU read-side critical sections and are preempted by medium-priority processes, but because the locks are used only to coordinate updates, the high-priority process P1 immediately can acquire the lock and carry out the update by creating a new version. Freeing the old version does have to wait for readers to complete, but this freeing can be deferred to avoid degrading real-time latencies.

Figure 12. RCU Prevents Priority Inversion

This combination of priority inheritance and RCU permits the -rt patchset to provide real-time latencies on mid-range multiprocessors. But priority inheritance is not a panacea. For example, one could imagine applying some form of priority inheritance to real-live users who might be blocking high-priority processes, as shown in Figure 13. However, I would rather we did not.

Figure 13. Priority Boosting for Users?



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

a question

hpare's picture

I have a question about that "interrupt" discribed in figure 6-8.
Could you tell me if this kind of interrupt happens on one CPU, from cpu catch a INTn do tophalf instructions to deal with the blue rectangle(maybe a softirq() of bottomhalf),do all of these was executed by one CPU?
waiting for your explanation!
thank you!

Threaded interrupts

paulmck's picture

There is a small portion of code that happens in the "top half", or hard irq context. On a non-PREEMPT_RT system he actual interrupt handler code would also execute in hard irq context. However, in PREEMPT_RT, the handler instead executes at process level in a kernel thread executing at real-time priority.

If this handler uses a bottom half, or softirq, then the softirq will be scheduled as another kernel thread, also executing at real-time priority.

The softirq interface is such that the softirq handler executes on the same CPU where the raise_softirq() request ran, Normally the system would be configured so that the hard irq and irq handler ran on the same CPU as well. (I believe that it can be configured otherwise, but I don't know of a good reason to do so.)

Great article, really interesting stuff

Adeel's picture

In addition, there are real-time audio systems, SIP servers and object brokers...

Can you give an example of rt audio/sip/object broker software/projects?

Also, has the -rt patch set had any impact on networking in linux? e.g. latency, iptables traversal time, etc

Would a standard program, e.g. X11, have a performance benefit on -rt compared to a non-rt system?

Examples of RT audio, SIP, object brokers...

paulmck's picture

There are a number of open-source audio projects. Two that come to mind immediately are Jack and Pulse audio, both of which were enthusiastic about testing out the -rt patchset. The only RT SIP servers that I am aware of are proprietary, ditto with object brokers.

There has been some effect of -rt on networking, but many real-time applications use lower-level protocols (such as UDP) or special transports (such as Infiniband) in order to retain greater control over latency. That said, there are special real-time protocols, such as the DDS suite.

Usually, real-time operating systems are designed for responsiveness, and usually give up throughput performance in favor of responsiveness. For one look at this issue, see my recent OLS paper on real time vs. real fast.


szkolenia biznesowe's picture

Nice article

thanks :)

Glad you liked it!

Paul E. McKenney's picture