Understanding a Context Switching Benchmark
One of the most important tasks of an operating system kernel is to manage processes and threads. A process is a program in execution, and a thread is just a CPU state stored within a process. A CPU context is either a process or a thread. Most processes have only one thread, but some processes, particularly servers, have more. Sometimes programs other than servers use multiple threads; Netscape Navigator is an example of such a program. These processes and threads represent the programs the user has selected to run. If managing processes and threads is slow, overall computer performance will also be slow.
Linux advocates naturally assume that Linux offers better performance than MS Windows. Gregory Travis (email@example.com) set out to test this assumption by testing the times needed to manage processes and threads. His benchmark was a simple program that timed simple operations like fork or sched_yield by generating a large number of processes or threads that did nothing more, individually, than looping in place. His results generated quite a controversy in the Linux newsgroups and the kernel e-mail list. All tests were done on a 200 MHz Pentium. (See Table 1.)
Linux can create a process twice as fast and a thread three times as fast as NT. Process creation takes longer than thread creation (12x for NT, 3x for Linux), because processes have much more overhead. Memory maps and file descriptor tables are just two of the things that must be created for a process (but not for a thread). Note, however, that Linux creates an entire process (the clone function) in about the same amount of time NT takes just to create a thread (1.0 ms vs. 0.9 ms).
Using this benchmark and an unmodified Linux 2.0.30 kernel, NT is much faster than Linux at context switching, either between processes (1.9x) or threads (3.2x). Since context switching occurs much more frequently than context creation, it would seem that Linux has a problem. But does it?
Why does this simple benchmark make the Linux context switching code so much slower than Windows NT? To answer that, one must understand the Linux scheduler.
The scheduler has a list called the run queue containing all contexts ready to run. These contexts are not sorted in any way. Each context also has a goodness, which is a measure of the current priority of the context. Two loops are within the scheduler. The first loop (the searching loop) finds the context with the highest goodness from the ready queue and selects that context to run. The second loop (the recalc loop) recalculates the goodnesses if all contexts have used their entire time slice. This second loop runs only occasionally. The code in Listing 1 shows the structure of the schedule function.
The search loop needs a small bit of CPU time for every runnable process and needs it on every context switch. The above benchmark shows the context switch times with 40 contexts present and ready to run. That corresponds to a load average of 40 for a single CPU system or 20 for a dual CPU system. These are very large load averages, much larger than what is normally considered healthy. Generally, there will be only a few (perhaps one to three) contexts to choose from. Most processes and threads, even very active ones, will be waiting for some I/O event to occur and are, therefore, not on the ready queue and not considered for selection.
Even heavily loaded machines will generally have most contexts waiting for I/O to complete; those contexts will not be on the run queue. Consider the site http://www.winsite.com/, a very heavily loaded Internet site, with about 200 copies of httpd running as web servers. The total number of processes for all purposes is about 420. The machine is a 333MHz Pentium II connected to three T1 lines.
Measurements of this machine indicate that 24,221,164 context switches occurred over a 17-hour period (400 switches per second). For these switches, during 4% of the time there were over 10 contexts ready to choose from at one moment, and during only 0.1% of the time were there twenty or more. The longest run queue during the 17-hour stretch was only 36 contexts, out of the 420 possible. The mean run-queue length averaged only 2.5 contexts. In a sense, this benchmark represents a worst case for Linux, and the average case, even for heavily loaded machines, is much better.
The second (recalc) loop is even more important in understanding these benchmark results. This recalc loop runs only when every context on the ready queue has used up its entire time slice (generally 20 ticks or 0.2 seconds). The recalc loop then recalculates the priority of each context. Normally, this recalc loop runs only occasionally. Most rescheduling is done in response to an interrupt or because of an I/O request; therefore, most contexts do not use their whole time slice. However, when a process or thread wants to yield the CPU, it calls sched_yield, which treats it as if the process had used its entire time slice. In this way, any process that calls sched_yield has its priority lowered and will not be scheduled again for a substantial period of time. The code for sched_yield is shown in Listing 2.
When all contexts have used their entire time slice, the scheduler recalculates the priority of all contexts using the recalc loop. Since during this benchmark all contexts do one cheap operation (sched_yield) and then have counter set to zero, this recalc loop runs much more often than normal. Again, this benchmark seems to be a worst case for Linux.
For the web site described above, measurements show only one of 200 context switches required a recalculation of its priority—the other 199 context switches did not.
Half the problem can be solved easily; the other half would be more difficult.
|Using Salt Stack and Vagrant for Drupal Development||May 20, 2013|
|Making Linux and Android Get Along (It's Not as Hard as It Sounds)||May 16, 2013|
|Drupal Is a Framework: Why Everyone Needs to Understand This||May 15, 2013|
|Home, My Backup Data Center||May 13, 2013|
|Non-Linux FOSS: Seashore||May 10, 2013|
|Trying to Tame the Tablet||May 08, 2013|
- Using Salt Stack and Vagrant for Drupal Development
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- New Products
- Validate an E-Mail Address with PHP, the Right Way
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- New Products
- RSS Feeds
- New Products
- This is the easiest tutorial
6 hours 1 min ago
- Ahh, the Koolaid.
11 hours 40 min ago
- git-annex assistant
17 hours 39 min ago
- direct cable connection
18 hours 2 min ago
- Agreed on AirDroid. With my
18 hours 12 min ago
- I just learned this
18 hours 16 min ago
18 hours 46 min ago
- not living upto the mobile revolution
21 hours 38 min ago
- Deceptive Advertising and
22 hours 13 min ago
- Let\'s declare that you have
22 hours 14 min ago
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi
It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.