Moving to SMP

by Michael S. Keller

For some, SMP, symmetric multi-processing, represents everyday work; for others, a hope for performance gains. Windows 9x users need not bother, since that platform supports only one CPU per host. Windows NT does not scale as well as other operating systems on multiple processors. The 2.0 Linux kernel series provided some SMP support, but the 2.2 series has much better support. Linux scales well, up to 16 CPUs. Beyond that, Linus Torvalds has not yet decided to commit, because the tradeoffs required to make Linux scale well to more would mean compromising performance for small systems. For large-scale SMP, Sun's Solaris does well at 64 CPUs. SGI and IBM also have large-scale SMP offerings. This article provides an introduction to running SMP under Linux on the x86.

Why SMP?

For some time, I had wanted to experiment with SMP Linux. When the 2.2 kernel appeared and another hardware transaction in my home network left me with a 486 for my desktop machine (my wife got the K6-200 to go with her new keyboard, where the integrated pointer required the only PS/2 mouse port in the house), I decided to replace the 486 with something a bit different.

While looking longingly at multi-CPU Alpha systems and Intel P6 and Pentium II motherboards, I decided to build an SMP machine at a lower cost, but with new parts. Alpha and Ultra-SPARC systems did not have the low price I preferred to pay, and since I had a Baby AT case, Pentium II motherboards would not work with the hardware I could recycle. That left Baby AT-size P5 and P6 motherboards. I did not find any new P6 motherboards at appealing prices, but I did find a few P5 boards that would work.

After conducting more research, including reviewing past discussions on SMP at Slashdot, I chose the Tyan Tomcat IV 1564D, which sports the Intel 430HX chip set. This board can use one or two (preferably matched) Pentium processors from the P5-75 to the P5-233 MMX. It can also hold up to 512MB RAM, spread across eight 72-pin SIMM slots, using parity or non-parity memory devices. It has typical on-board I/O, including two IDE, two serial, one parallel, one PS/2 mouse port and USB. It can also use non-Intel CPUs, but will support only one, since Intel had de facto control of SMP in the x86 world until the AMD Athlon reached market, and the non-Intel offerings that support any SMP do not use the Intel signalling.

I purchased two P5-233 MMX CPUs and the motherboard from Motherboard Express. I added 128MB of fast-page parity memory from Crucial Technology.

SMP Requirements

For proper operation, SMP should combine with a thread-safe libc, such as glibc2. I run Debian GNU/Linux version 2.1, which has all libraries and utilities up to date for SMP. Debian's package set also includes libc5 libraries for software compiled to require libc5. After three years of near-continuous use, I find Debian the most pleasing Linux distribution. I have had no trouble performing upgrades and keeping current with updates. The package manager requires no manual downloads, and retrieves only the packages necessary for staying current. (My previous and parallel experience with Red Hat, through version 5.2, found no such facility built into or near RPM. If it exists, I missed it.)

Some drivers also require updates in order to perform correctly under an SMP kernel, since additional locking must occur to reduce contention for system resources. 4Front Technologies' OSS sound driver comes in uniprocessor and SMP varieties. PCMCIA Card Services may require recompilation. Most other drivers reside in the kernel source tree, so they should work with SMP after compiling a new kernel.

Making It Work

After replacing the 486 motherboard with the new Tyan unit, Linux booted straightaway. I already had the 2.2 kernel running, and so reconfigured it for SMP. (See smp.txt in the Documentation subdirectory of the Linux kernel source for more on how to perform this task.)

The first SMP kernel I compiled did not work correctly. From my reading of the documentation included with 4Front Technologies' OSS sound drivers and in the kernel itself, I realized the dependencies didn't get built correctly. I saved the .config file elsewhere, performed make mrproper to clean the kernel source tree, then restored the .config file. After performing make oldconfig, I built again and installed the SMP kernel. On the next boot, I saw additional startup messages to indicate that both CPUs had started running. The 2.2.7 kernel, in conjunction with the utilities shipped with Debian 2.1, report each process's CPU usage as a percentage of the total available. A process consuming all of one CPU will show 50% usage.

Using It

In order to demonstrate the difference in performance provided by a second CPU, I performed benchmarks with Linux kernel compilation, the distributed.net rc5des encryption breaker and with POV-Ray's ray tracer (see Table 1). All take direct advantage of multiple CPUs. POV-Ray can also directly use CPUs spread across a network. All figures represent averages of three runs.

Table 1

Recompilation of the uniprocessor 2.2.7 kernel took 376.91 seconds when running under the same kernel. Recompilation of the SMP 2.2.7 kernel, running under the same SMP kernel, took 395.04 seconds when run on only one CPU, 5 percent longer than the uniprocessor compilation time. When run on two CPUs (make -j 2 bzImage), the compilation took 302.77 seconds, 80 percent of the uniprocessor compilation time.

For POV-Ray, I used the benchmark source file, skyvase.pov, available from POV-Ray's web site. I ran it at xpvmpov's default resolution of 320x240. SMP took 72 percent of the time for a uniprocessor run.

The rc5des code cracker performed its benchmark at nearly the same rate under both uniprocessor and SMP kernels. When in actual operation, it will run on as many CPUs as desired or automatically detect the number of CPUs. I believe there were much smaller performance differences between the two kernels because of the optimizations it contains for maximum performance. It most likely runs within the level 1 (L1) cache as much as possible.

SMP may improve performance in other ways. GUI operations may benefit from having the X server run on one CPU while an application runs on another. Anything that runs well on one CPU but can take advantage of another will benefit from using SMP. I now run the SETI@home client on all CPUs I have that run Linux.

Running Even Faster

Both L1 and L2 cache quantity and speed matter. RAM speed matters. The Intel P5-233MMX contains a 32KB L1 cache, distributed as a 16KB code cache and a 16KB data cache. My wife's AMD K6-200MMX contains a 64KB L1 cache, distributed as a 32KB code cache and a 32KB data cache. For some tasks, it performs faster than one Intel P5-233MMX. Intel Pentium Pro CPUs have both L1 and L2 cache on board, with up to 1MB of L2. Pentium II CPUs have up to 2MB L2 cache on board. New CPUs also run their caches faster. More cache on the CPU means less contention for external cache and main RAM, which means higher performance. The CPUs, through the support chip set, co-operate among themselves to maintain cache coherency, so that they always maintain accurate views of RAM.

Locking a process to one CPU, particularly when that process' code and data fit in the L1 cache, may also improve performance. Linux does not support this as fully as more mature UNIX variants, but it probably will soon.

Conclusion

Do I need SMP for what I do? No. A single 200MHz P5-class processor can adequately perform the tasks I want to perform. As for most tasks, adequate memory, both RAM and cache, contributes more to performance than the number of processors. Do I have fun with it? Oh, yes.

Resources

Michael S. Keller works as a technical analyst with Sprint Paranet, a wholly owned subsidiary of Sprint, a nationwide network services provider based in Houston. He has used UNIX variants for nearly nine years and enjoys communing with cats, motorcycles and the universe. You may reach him at mskeller@sprintparanet.com.
Load Disqus comments