SMP and Embedded Real Time
Parallel programming might not be mind crushingly hard, but it is certainly harder than single-threaded programming. Real-time programming is also hard. So, why would anyone be crazy enough to take on both at the same time?
It is true that real-time parallel programming poses special challenges, including interactions with lock-induced delays, interrupt handlers and priority inversion. However, Ingo Molnar's -rt patchset provides both kernel and application developers with tools to deal with these challenges. These tools are described in the following sections.
Much ink has been spilled on locking and real-time latency, but we will stick to the following simple points:
Reducing lock contention improves SMP scalability and reduces real-time latency.
When lock contention is low, there are a finite number of tasks, critical-section execution time is bounded, and locks act in a first-come-first-served manner to the highest-priority tasks, then lock wait times for those tasks will be bounded.
An SMP Linux kernel by its very nature requires very few modifications in order to support the aggressive preemption required by real time.
The first point should be obvious, because spinning on locks is bad for both scalability and latency. For the second point, consider a queue at a bank where each person spends a bounded time T with a solitary teller, there are a bounded number of other people N, and the queue is first-come-first-served. Because there can be at most N people ahead of you, and each can take at most time T, you will wait for at most time NT. Therefore, FIFO priority-based locking really can provide hard real-time latencies.
For the third point, see Figure 5. The left-hand side of the diagram shows three functions A(), B() and C() executing on a pair of CPUs. If functions A() and B() must exclude function C(), some sort of locking scheme must be used. However, that same locking provides the protection needed by the -rt patchset's preemption, as shown on the right-hand side of this diagram. If function B() is preempted, function C() blocks as soon as it tries to acquire the lock, which permits B() to run. After B() completes, C() may acquire the lock and resume running.

Figure 5. SMP Locking and Preemption
This approach requires that kernel spinlocks block, and this change is fundamental to the -rt patchset. In addition, per-CPU variables must be protected more rigorously. Interestingly enough, the -rt patchset also located a number of SMP bugs that had gone undetected.
However, in the standard Linux kernel, interrupt handlers cannot block. But interrupt handlers must acquire locks, which can block in -rt. What can be done?
Not only are blocking locks a problem for interrupt handlers, but they also can seriously degrade real-time latency, as shown in Figure 6.

Figure 6. Interrupts Degrade Latency
This degradation can be avoided by running the interrupt handler in process context, as shown in Figure 7, which also allows them to acquire blocking locks.

Figure 7. Move Interrupt Handlers to Process Context
Even better, these process-based interrupt handlers can actually be preempted by user-level real-time threads, as shown in Figure 8, where the blue rectangle within the interrupt handler represents a high-priority real-time user process preempting the interrupt handler.

Figure 8. Preempting Interrupt Handlers
Of course, “with great power comes great responsibility.” For example, a high-priority real-time user process could starve interrupts entirely, shutting down all I/O. One way to handle this situation is to provide a low-priority “canary” process. If the “canary” is blocked for longer than a predetermined time, one might kill the offending thread.
Running interrupts in process context permits interrupt handlers to acquire blocking locks, which in turn allows critical sections to be preempted, which permits extremely fast real-time scheduling latencies. In addition, the -rt patchset permits real-time application developers to select the real-time priority at which interrupt handlers run. By running only the most critical portions of the real-time application at higher priority than the interrupt handlers, the developers can minimize the amount of code for which “great responsibility” must be shouldered.
However, preempting critical sections can lead to priority inversion, as described in the next section.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
| Non-Linux FOSS: libnotify, OS X Style | Jun 18, 2013 |
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
| Introduction to MapReduce with Hadoop on Linux | Jun 05, 2013 |
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Linux Systems Administrator
- Validate an E-Mail Address with PHP, the Right Way
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Senior Perl Developer
- RSS Feeds
- Technical Support Rep
- Introduction to MapReduce with Hadoop on Linux
- Weechat, Irssi's Little Brother
- What the author describes
1 hour 2 min ago - Reply to comment | Linux Journal
5 hours 13 min ago - Reply to comment | Linux Journal
5 hours 58 min ago - Didn't read
6 hours 8 min ago - Reply to comment | Linux Journal
6 hours 13 min ago - Poul-Henning Kamp: welcome to
8 hours 23 min ago - This has already been done
8 hours 24 min ago - Reply to comment | Linux Journal
9 hours 10 min ago - Welcome to 1998
9 hours 58 min ago - notifier shortcomings
10 hours 22 min ago




Comments
a question
I have a question about that "interrupt" discribed in figure 6-8.
Could you tell me if this kind of interrupt happens on one CPU, from cpu catch a INTn do tophalf instructions to deal with the blue rectangle(maybe a softirq() of bottomhalf),do all of these was executed by one CPU?
waiting for your explanation!
thank you!
Threaded interrupts
There is a small portion of code that happens in the "top half", or hard irq context. On a non-PREEMPT_RT system he actual interrupt handler code would also execute in hard irq context. However, in PREEMPT_RT, the handler instead executes at process level in a kernel thread executing at real-time priority.
If this handler uses a bottom half, or softirq, then the softirq will be scheduled as another kernel thread, also executing at real-time priority.
The softirq interface is such that the softirq handler executes on the same CPU where the raise_softirq() request ran, Normally the system would be configured so that the hard irq and irq handler ran on the same CPU as well. (I believe that it can be configured otherwise, but I don't know of a good reason to do so.)
Great article, really interesting stuff
In addition, there are real-time audio systems, SIP servers and object brokers...
Can you give an example of rt audio/sip/object broker software/projects?
Also, has the -rt patch set had any impact on networking in linux? e.g. latency, iptables traversal time, etc
Would a standard program, e.g. X11, have a performance benefit on -rt compared to a non-rt system?
Examples of RT audio, SIP, object brokers...
There are a number of open-source audio projects. Two that come to mind immediately are Jack and Pulse audio, both of which were enthusiastic about testing out the -rt patchset. The only RT SIP servers that I am aware of are proprietary, ditto with object brokers.
There has been some effect of -rt on networking, but many real-time applications use lower-level protocols (such as UDP) or special transports (such as Infiniband) in order to retain greater control over latency. That said, there are special real-time protocols, such as the DDS suite.
Usually, real-time operating systems are designed for responsiveness, and usually give up throughput performance in favor of responsiveness. For one look at this issue, see my recent OLS paper on real time vs. real fast.
szkolenia
Nice article
thanks :)
Glad you liked it!
;-)