Lowering Latency in Linux: Introducing a Preemptible Kernel

Whether you seek higher scores in Quake, are an audio enthusiast or want a smoother desktop, lowering latency in the kernel is an important goal.

Performance measurements come in two flavors, throughput and latency. The former is like the width of an expressway: the wider the expressway, the more cars that can travel on it. The latter is like the speed limit: the faster it is, the sooner cars get from point A to point B. Obviously, both quantities are important to any task. Many jobs, however, require more of one quality than of the other. Sticking to our roadway analogy, long-haul trucking may be more sensitive to throughput, while a courier service may be more demanding on latency. Lowering latency and increasing system response, through good old-fashioned kernel work, is the topic of this article.

Audio/video processing and playback are two common beneficiaries of lowered latency. Increasingly important to Linux, however, is its benefit to interactive performance. With high latency, user actions, such as mouse clicks, go unnoticed for too long—not the snappy responsive desktop users expect. The system cannot get to the important process fast enough.

The problem, at least as far as the kernel is concerned, is the nonpreemptibility of the kernel itself. Normally, if something sufficiently important happens, such as an interactive event, the receiving application will get a priority boost and subsequently find itself running. This is how a preemptively multitasked OS works. Applications run until they use up some default amount of time (called a timeslice) or until an important event occurs. The alternative is cooperative multitasking, where applications must explicitly say, “I'm done!”, before a new process can run. The problem, when running in the kernel, is that scheduling is effectively cooperative.

Applications operate in one of two modes: either in user space, executing their own code, or within the kernel, executing a system call or otherwise having the kernel work on their behalf. When operating in the kernel, the process continues running until it decides to stop, ignoring timeslices and important events. If a more important process becomes runnable, it cannot be run until the current process, if it is in the kernel, gets out. This process can take hundreds of milliseconds.

Latency Solutions

The first and simplest solution to latency problems is to rewrite kernel algorithms so that they take a minimal, bounded amount of time. The problem is that this is already the goal; system calls are written to return quickly to user space, yet we still have latency problems. Some algorithms simply do not scale nicely.

The second solution is to insert explicit scheduling points throughout the kernel. This approach, taken by the low-latency patches, finds problem areas in the kernel and inserts code to the effect of “Anyone need to run? If so, run!” The problem with this solution is that we cannot possibly hope to find and fix all problem areas. Nonetheless, testing shows that these patches do a good job. What we need, however, is not a quick fix but a solution to the problem itself.

The Preemptible Kernel

A proper solution is removing the problem altogether by making the kernel preemptible. Thus, if something more important needs to run, it will run, regardless of what the current process is doing. The obstacle here, and the reason Linux did not do this from the start, is that the kernel would need to be re-entrant. Thankfully, the issues of preemption are solved by existing SMP (symmetric multiprocessing) support. By taking advantage of the SMP code, in conjunction with some other simple modifications, the kernel can be made preemptible.

The programmers at MontaVista provided the initial implementation of kernel preemption. First, the definition of a spin lock was modified to include marking a “nonpreemptible” region. Therefore, we do not preempt while holding a spin lock, just as we do not enter a locked region concurrently under SMP. Of course, on uniprocessor systems we do not actually make the spin locks anything other than the preemption markers. Second, code was modified to ensure that we do not preempt inside a bottom half or inside the scheduler itself. Finally, the return from interrupt code path was modified to reschedule the current process if needed.

On UP, spin_lock is defined as preempt_disable, and spin_unlock is defined as preempt_enable. On SMP, they also perform the normal locking. So what do these new routines do?

The nestable preemption markers preempt_disable and preempt_enable operate on preempt_count, a new integer stored in each task_struct. preempt_disable effectively is:

++current->preempt_count;
barrier();

and preempt_enable is:

--current->preempt_count;
barrier();
if (unlikey(!current->preempt_count
    && current->need_resched))
    preempt_schedule();
The result is we do not preempt when the count is greater than zero. Because spin locks are already in place to protect critical regions for SMP machines, the preemptible kernel now has its protection too.

preempt_schedule implements the entry to schedule itself. It sets a flag in the current process to signify it was preempted, calls schedule and, upon return, unsets the flag:

asmlinkage void preempt_schedule(void)
{
    do {
        current->preempt_count += PREEMPT_ACTIVE;
        schedule();
        current->preempt_count -= PREEMPT_ACTIVE;
    } while (current->need_resched);
}

The other entry to preempt_schedule is via the interrupt return path. When an interrupt handler returns, it checks the preempt_count and need_resched variables, just as preempt_enable does (although the interrupt return code in entry.S is in assembly). The ideal scenario is to cause a preemption here because it is an interrupt that typically sets need_resched due to a hardware event. It is not always possible, however, to preempt immediately off the interrupt, as a lock may be held. That is why we also check for preemption off preempt_enable.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

AH! I get it now.

Wo_Dao's picture

Hmm, this explains why some distros are slow or fast. Lunar Linux allows a Preemptiblle kernel. And it's just lightning fast without any bloat. Hmm, put this option with other distros...and perhaps...*ZOOM ZOOM ZOOM*

Just a rambling thought...anyways, this article comes in handy while deciding on kernel configuration in the unix console.. :p Keep it around!!

Thanks

Manjunath MB's picture

Thanks Robert,
nicely explained...

spinlocks in preemptible kernel

San's picture

Hi,

The information here is very helpful. thanks.

I have one question.

What is the effect of having kmalloc calls in the code that is protected with spinlock in a preemptible kernel.
rgds
sangeeta

kernel preemption

Anonymous's picture

Excellent article, explains well. So that's where all kernel preemption stuff in 2.6 come from!

One question I always had was when the process is in kernel execution path (system call execution), doesn't scheduler_tick decrement its timeslice, and if it does when the timeslice is over, doesn't the scheduler preempts the current process, even though it might be in the middle of the kernel exec path?

Re: Lowering Latency in Linux: Introducing a Preemptible Kernel

Anonymous's picture

It's helpful for me to understand the preemptive kernel.
Thanks.

MF

Seminar on your topic

Pavan Boob's picture

Hi,
I am studying in final year of computer engg at MIT .
I read your extract and you will be glad to know that I have taken the same topic for a technical seminar at our college.I thereby request you to please send me some more technical details and direct me to some inportant links
waiting for your positive reply.
Regards,
Pavan

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix