Moving to SMP

Wondering about multiprocessing? Think it might be fun? For one man's experience with setting up SMP, read on.
Using It

In order to demonstrate the difference in performance provided by a second CPU, I performed benchmarks with Linux kernel compilation, the rc5des encryption breaker and with POV-Ray's ray tracer (see Table 1). All take direct advantage of multiple CPUs. POV-Ray can also directly use CPUs spread across a network. All figures represent averages of three runs.

Table 1

Recompilation of the uniprocessor 2.2.7 kernel took 376.91 seconds when running under the same kernel. Recompilation of the SMP 2.2.7 kernel, running under the same SMP kernel, took 395.04 seconds when run on only one CPU, 5 percent longer than the uniprocessor compilation time. When run on two CPUs (make -j 2 bzImage), the compilation took 302.77 seconds, 80 percent of the uniprocessor compilation time.

For POV-Ray, I used the benchmark source file, skyvase.pov, available from POV-Ray's web site. I ran it at xpvmpov's default resolution of 320x240. SMP took 72 percent of the time for a uniprocessor run.

The rc5des code cracker performed its benchmark at nearly the same rate under both uniprocessor and SMP kernels. When in actual operation, it will run on as many CPUs as desired or automatically detect the number of CPUs. I believe there were much smaller performance differences between the two kernels because of the optimizations it contains for maximum performance. It most likely runs within the level 1 (L1) cache as much as possible.

SMP may improve performance in other ways. GUI operations may benefit from having the X server run on one CPU while an application runs on another. Anything that runs well on one CPU but can take advantage of another will benefit from using SMP. I now run the SETI@home client on all CPUs I have that run Linux.

Running Even Faster

Both L1 and L2 cache quantity and speed matter. RAM speed matters. The Intel P5-233MMX contains a 32KB L1 cache, distributed as a 16KB code cache and a 16KB data cache. My wife's AMD K6-200MMX contains a 64KB L1 cache, distributed as a 32KB code cache and a 32KB data cache. For some tasks, it performs faster than one Intel P5-233MMX. Intel Pentium Pro CPUs have both L1 and L2 cache on board, with up to 1MB of L2. Pentium II CPUs have up to 2MB L2 cache on board. New CPUs also run their caches faster. More cache on the CPU means less contention for external cache and main RAM, which means higher performance. The CPUs, through the support chip set, co-operate among themselves to maintain cache coherency, so that they always maintain accurate views of RAM.

Locking a process to one CPU, particularly when that process' code and data fit in the L1 cache, may also improve performance. Linux does not support this as fully as more mature UNIX variants, but it probably will soon.


Do I need SMP for what I do? No. A single 200MHz P5-class processor can adequately perform the tasks I want to perform. As for most tasks, adequate memory, both RAM and cache, contributes more to performance than the number of processors. Do I have fun with it? Oh, yes.


Michael S. Keller works as a technical analyst with Sprint Paranet, a wholly owned subsidiary of Sprint, a nationwide network services provider based in Houston. He has used UNIX variants for nearly nine years and enjoys communing with cats, motorcycles and the universe. You may reach him at


Geek Guide
The DevOps Toolbox

Tools and Technologies for Scale and Reliability
by Linux Journal Editor Bill Childers

Get your free copy today

Sponsored by IBM

8 Signs You're Beyond Cron

Scheduling Crontabs With an Enterprise Scheduler
On Demand
Moderated by Linux Journal Contributor Mike Diehl

Sign up and watch now

Sponsored by Skybot