Lowering Latency in Linux: Introducing a Preemptible Kernel
Performance measurements come in two flavors, throughput and latency. The former is like the width of an expressway: the wider the expressway, the more cars that can travel on it. The latter is like the speed limit: the faster it is, the sooner cars get from point A to point B. Obviously, both quantities are important to any task. Many jobs, however, require more of one quality than of the other. Sticking to our roadway analogy, long-haul trucking may be more sensitive to throughput, while a courier service may be more demanding on latency. Lowering latency and increasing system response, through good old-fashioned kernel work, is the topic of this article.
Audio/video processing and playback are two common beneficiaries of lowered latency. Increasingly important to Linux, however, is its benefit to interactive performance. With high latency, user actions, such as mouse clicks, go unnoticed for too long—not the snappy responsive desktop users expect. The system cannot get to the important process fast enough.
The problem, at least as far as the kernel is concerned, is the nonpreemptibility of the kernel itself. Normally, if something sufficiently important happens, such as an interactive event, the receiving application will get a priority boost and subsequently find itself running. This is how a preemptively multitasked OS works. Applications run until they use up some default amount of time (called a timeslice) or until an important event occurs. The alternative is cooperative multitasking, where applications must explicitly say, “I'm done!”, before a new process can run. The problem, when running in the kernel, is that scheduling is effectively cooperative.
Applications operate in one of two modes: either in user space, executing their own code, or within the kernel, executing a system call or otherwise having the kernel work on their behalf. When operating in the kernel, the process continues running until it decides to stop, ignoring timeslices and important events. If a more important process becomes runnable, it cannot be run until the current process, if it is in the kernel, gets out. This process can take hundreds of milliseconds.
The first and simplest solution to latency problems is to rewrite kernel algorithms so that they take a minimal, bounded amount of time. The problem is that this is already the goal; system calls are written to return quickly to user space, yet we still have latency problems. Some algorithms simply do not scale nicely.
The second solution is to insert explicit scheduling points throughout the kernel. This approach, taken by the low-latency patches, finds problem areas in the kernel and inserts code to the effect of “Anyone need to run? If so, run!” The problem with this solution is that we cannot possibly hope to find and fix all problem areas. Nonetheless, testing shows that these patches do a good job. What we need, however, is not a quick fix but a solution to the problem itself.
A proper solution is removing the problem altogether by making the kernel preemptible. Thus, if something more important needs to run, it will run, regardless of what the current process is doing. The obstacle here, and the reason Linux did not do this from the start, is that the kernel would need to be re-entrant. Thankfully, the issues of preemption are solved by existing SMP (symmetric multiprocessing) support. By taking advantage of the SMP code, in conjunction with some other simple modifications, the kernel can be made preemptible.
The programmers at MontaVista provided the initial implementation of kernel preemption. First, the definition of a spin lock was modified to include marking a “nonpreemptible” region. Therefore, we do not preempt while holding a spin lock, just as we do not enter a locked region concurrently under SMP. Of course, on uniprocessor systems we do not actually make the spin locks anything other than the preemption markers. Second, code was modified to ensure that we do not preempt inside a bottom half or inside the scheduler itself. Finally, the return from interrupt code path was modified to reschedule the current process if needed.
On UP, spin_lock is defined as preempt_disable, and spin_unlock is defined as preempt_enable. On SMP, they also perform the normal locking. So what do these new routines do?
The nestable preemption markers preempt_disable and preempt_enable operate on preempt_count, a new integer stored in each task_struct. preempt_disable effectively is:
++current->preempt_count; barrier();
and preempt_enable is:
--current->preempt_count;
barrier();
if (unlikey(!current->preempt_count
&& current->need_resched))
preempt_schedule();
The result is we do not preempt when the count is greater than
zero. Because spin locks are already in place to protect critical
regions for SMP machines, the preemptible kernel now has its
protection too.
preempt_schedule implements the entry to schedule itself. It sets a flag in the current process to signify it was preempted, calls schedule and, upon return, unsets the flag:
asmlinkage void preempt_schedule(void)
{
do {
current->preempt_count += PREEMPT_ACTIVE;
schedule();
current->preempt_count -= PREEMPT_ACTIVE;
} while (current->need_resched);
}
The other entry to preempt_schedule is via the interrupt return path. When an interrupt handler returns, it checks the preempt_count and need_resched variables, just as preempt_enable does (although the interrupt return code in entry.S is in assembly). The ideal scenario is to cause a preemption here because it is an interrupt that typically sets need_resched due to a hardware event. It is not always possible, however, to preempt immediately off the interrupt, as a lock may be held. That is why we also check for preemption off preempt_enable.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- RSS Feeds
- What's the tweeting protocol?
- New Products
- Trying to Tame the Tablet
- Dart: a New Web Programming Experience
- Reply to comment | Linux Journal
14 hours 23 min ago - Reply to comment | Linux Journal
16 hours 55 min ago - Reply to comment | Linux Journal
18 hours 13 min ago - great post
18 hours 47 min ago - Google Docs
19 hours 10 min ago - Reply to comment | Linux Journal
23 hours 58 min ago - Reply to comment | Linux Journal
1 day 45 min ago - Web Hosting IQ
1 day 2 hours ago - Thanks for taking the time to
1 day 3 hours ago - Linux is good
1 day 5 hours ago
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.




Comments
AH! I get it now.
Hmm, this explains why some distros are slow or fast. Lunar Linux allows a Preemptiblle kernel. And it's just lightning fast without any bloat. Hmm, put this option with other distros...and perhaps...*ZOOM ZOOM ZOOM*
Just a rambling thought...anyways, this article comes in handy while deciding on kernel configuration in the unix console.. :p Keep it around!!
Thanks
Thanks Robert,
nicely explained...
spinlocks in preemptible kernel
Hi,
The information here is very helpful. thanks.
I have one question.
What is the effect of having kmalloc calls in the code that is protected with spinlock in a preemptible kernel.
rgds
sangeeta
kernel preemption
Excellent article, explains well. So that's where all kernel preemption stuff in 2.6 come from!
One question I always had was when the process is in kernel execution path (system call execution), doesn't scheduler_tick decrement its timeslice, and if it does when the timeslice is over, doesn't the scheduler preempts the current process, even though it might be in the middle of the kernel exec path?
Re: Lowering Latency in Linux: Introducing a Preemptible Kernel
It's helpful for me to understand the preemptive kernel.
Thanks.
MF
Seminar on your topic
Hi,
I am studying in final year of computer engg at MIT .
I read your extract and you will be glad to know that I have taken the same topic for a technical seminar at our college.I thereby request you to please send me some more technical details and direct me to some inportant links
waiting for your positive reply.
Regards,
Pavan