In preparation for the Ultimate Linux Box print article, I've been doing some work with benchmarks. As it turns out, Linux Journal Chief Geek Dan Wilder has been working with benchmarks as well. I thought that instead of re-inventing the wheel, we should combine efforts and give folks insight into what we're contributing to the process.
Benchmarking something like the Ultimate Linux Box covers a whole lot of ground. Disk I/O, CPU speed, graphics capability and even how loud the computer is figure into benchmarking. What Dan was working on was mostly server-based. I'm adding a graphics test, a threaded I/O test and, although it isn't software-based, an ambient sound test.
The first test is actually something that's been around informally since Linus was a kid in Helsinki: How fast can we compile the kernel? This is a fairly CPU-intensive test but requires enough I/O to give you a good idea of how fast a box runs under a development-type load, which is something you'd be doing with a workstation. It's also a time-honored smoke test amongst us in the old guard; if a box can stand compiling the kernel in a loop overnight, it's usually good to go. There are more strenuous system exercisers, but this is the easiest to set up; if it's Linux, it by definition comes with kernel source. Of course, source or even a compiler isn't always installed on a production system, but you're not going to run this in production, now, are you? We're using Linux 2.4.20 and a canned .config file for right now; I intend to add multiprocessor support to the compile before running the final benchmark.
Next up is bonnie++ 1.03a. Bonnie++ is a good basic benchmark aimed at small-to-medium-sized I/O, using semaphores for interprocess communication. It measures sequential input, sequential output and random seeks. It makes an effort to defeat the OS's attempts to cache data in order to get at the raw filesystem I/O speed.
At this point, I'm adding in tiobench. This is another filesystem benchmark with three major differences from bonnie++: it's threaded instead of using semaphores, as many modern programs are; it runs with a larger file size, approaching 2GB as opposed to 200MB; and it does not attempt to defeat OS caching outright, although I generally keep it from coming up with ridiculous numbers by limiting available RAM to 1GB. tiobench is a good large-system benchmark as opposed to bonnie++, which would give you a better idea of what a mail or news server that handles small files all the time would do. Not everyone runs the same kinds of things on their Linux boxes, so I felt we should present both benchmarks and, in so doing, explore a full range of performance.
The fourth benchmark is PostgreSQL's regression tests. Instead of relying on having PostgreSQL installed on the machine, we simply include the source in the tarball, as we do for the kernel, bonnie++ and tiobench tests. We then compile PostgreSQL from scratch, include the regression suite and simply run the database tests right there from the source tree. Although the regression tests run on relatively small database files, it should give you some indication about how the system performs on large file. As Dan said, the best benchmark for a given task is the task itself or as close a simulation as you can come up with. What we're trying to do here is get you in the ballpark, not put the pitch over the outside corner. We're using PostgreSQL 7.3.2, which is not quite bleeding edge but currently is only one minor version down.
Now for test five, graphics. Currently I'm doing something similar to the classic Quake III: Arena tests the Windows testers love, only with a GPL game, Chromium. This little shooter does a nice job of putting the frame rate up in the corner of your screen, and it also logs it to stderr. This makes it easy enough to use Perl (or awk/grep/sed, if you're old school like me) to grab the fps ratings and find the average. But it's a somewhat ungainly monster to put in a source tarball. I'm thinking we may replace it with something IBM wrote called viewperf, which is designed to be a benchmark out of the box. I want to do some more research, however, on the topics of licensing and required libraries. Viewperf is, as are most Open Source projects, a work in progress.
Test six involves no software at all, unless you count what's on the microprocessor in the sound meter. This is a fairly simple ambient noise check. We're currently using a CheckMate CM100 sound pressure level meter from Galaxy Audio. We set it to dBa fast mode, and simply measured the sound coming from the machine in a relatively quiet (<40db) room. We measure at 10" from the center of the front of the case and likewise from the back, where it's usually the noisiest. We then measure 24" from the center of the top front edge of the case, which is about where the operator's ear would be, assuming the case is in its usual on-the-floor position (for a tower case). We take an average level, generating the averages with the industry-standard Mark IV Eyeball. If the machine has temperature sensitive fans, we measure both a warmed-up idle machine (on for 15 minutes or so, so the temperature has stabilized) and a machine under full load (like, say, the kernel compile loop), again allowing time for the temperature and noise to stabilize. We added this test after getting initial feedback on the ULB Case Study article concerning the tendency of high-end machines to sound like the proverbial jet engine.
The idea here is for tests 1-5 five to be distributed in one large tarball, with a top-level Makefile that compiles, configures and runs the entire suite--during which time you could run test six for the machine under load. Ideally the entire suite would be OSI-compliant. The kernel, obviously, as well as bonnie++ and tiobench are GPL; Chromium is under the Artistic License and PostgreSQL is released BSD-style. At press time I don't have the terms for viewperf, which may affect our decision to use it. The glue code that runs the suite is, of course, GPL. Feedback on the tests we use is, of course, welcome. Once the entire suite of benchmark tests are glued together, we'll let you know where to grab it and exactly what other software you'll need to run it. We will include as much of the sources for what you need as we can, but some things, such as the PAM and readline development libraries, really should be the system versions. On the other hand, including specific versions of source for the major programs the benchmark will run ensures the integrity of the numbers we produce.
I'm also going to be looking for a name for this monster. Dan has a working name for it, but I think we'll want something that looks respectable when we put the numbers in front of some C-level executive that doesn't grok the sense of humor inherent in Linux and its ancestors. Feedback, again, is welcome, but keep in mind that our target audience will be non-geek as well as geek.
Watch this space for more news as we continue to develop this project.
Glenn Stone is a Red Hat Certified Engineer, sysadmin, technical writer, cover model and general Linux flunkie. He has been hand-building computers for fun and profit since 1999, and he is a happy denizen of the Pacific Northwest.