Building the Ultimate Linux Box
Five years ago, I wrote an article for Linux Journal that developed a recipe for an elegant and economical Linux box. The article became one of the most popular in LJ's history, so the editors have invited me back for a second round.
This time around LJ recruited Rick Moen, author of some well-known FAQs on modems and other hardware topics, to assist. Daryll Strauss, the man behind the famous Linux renderfarm used in the movie Titanic, contributed sage advice from his background in graphics and extreme data crunching. Also, instead of going for economy we're going to go for maximum crunching power. We're going to ask how to get the absolute highest performance out of hardware we can live with.
Hardware you can live with means a machine that is stable, easy to troubleshoot and inexpensive to maintain. It should be small and low-maintenance enough to live beside your desk, not some liquid-cooled monstrosity. It should be, in short, a PC—a gold-plated hot rod of a PC but a PC nevertheless. Other important aspects of livability are the levels of emitted acoustic noise and heat; we'll be minimizing both.
We'll stick with PC hardware. Alphas are fast and have that wonderful, sexy 64-bit architecture, but the line seems all too likely to be killed in favor of the Itanium before long. Considered in isolation, I like the PowerPC chip a lot better than any x86 architecture. But PC hardware has all the advantages of being the biggest market; it's the easiest to get serviced and least expensive to upgrade.
The Ultimate Linux Box that we showcase will, of course, fall behind the leading edge within months (or even by press time). But walking through the process of developing the ULB will teach you things about system design and troubleshooting that you can continue to apply long after the hardware in this article has become obsolete.
For typical job loads under Linux, the processor type is nearly a red herring—it's far more important to specify a capable system bus and disk I/O subsystem. If you don't believe this, you may find it enlightening to keep top(1) running for a while as you use your machine. Notice how seldom the CPU idle percentage drops below 90%.
If you're building a ULB, go ahead and buy the fastest available processor. Once you've gotten past that gearhead desire for big numbers, pay careful attention to bus speeds and your disk subsystem because that's where you'll achieve serious performance wins. The gap between processor speed and I/O subsystem throughput has only widened in the last five years, so this is even better advice than it was in 1996.
How does all this translate into a recipe in 2001? Get a PCI-bus machine, not a hybrid PCI/ISA design; you sacrifice about 10% of peak performance with those. Get the fastest available front-side (processor-to-memory) bus (in August 2001, the maximum is 266MHz). Get a high-speed SCSI controller and the fastest SCSI disks you can get your hands on.
The case for SCSI is a little less obvious but still compelling. For starters, SCSI is still at least 10%-15% faster than IDE/ATAPI running flat out. Because it's perceived as a professional choice, SCSI peripherals are generally better engineered than IDE/ATAPI equivalents, and new high-performing drive technologies tend to become available in SCSI first. You'll pay a few dollars more, but the cost is well repaid in increased throughput and reliability. Rick Moen comments:
They call me a SCSI bigot. So be it—but my hardware keeps being future-proof: I don't have to run bizarre emulation layers to address CDRs, I never run low on IRQs or resort to IRQ-sharing, all my hard drives have hardware-level hot-fix, all my hard disk/CD/tape/etc., devices support a stable standard rather than this month's cheap extension kludge, and I don't have to worry about adverse interactions at the hardware or driver levels from mixing ATA and SCSI.
Neither Daryll nor I will have IDE in any machine we build either.
To pick the fastest disks, pay close attention to average seek and latency time. The former is an average time required to seek to any track; the latter is the maximum time required for any sector on a track to come under the heads.
Of these, average seek time is much more important. When you're running Linux, a one millisecond faster seek time can make a substantial difference in system throughput. The manufacturers themselves avoid running up seek time on larger-capacity drives by stacking platters vertically rather than increasing platter size. Thus, seek time, which is proportional to the platter radius and head-motion speed, tends to be constant across different capacities in the same product line. This is good because it means you don't have to worry about a capacity vs. speed trade-off.
|Speed Up Your Web Site with Varnish||Jun 19, 2013|
|Non-Linux FOSS: libnotify, OS X Style||Jun 18, 2013|
|Containers—Not Virtual Machines—Are the Future Cloud||Jun 17, 2013|
|Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer||Jun 12, 2013|
|Weechat, Irssi's Little Brother||Jun 11, 2013|
|One Tail Just Isn't Enough||Jun 07, 2013|
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Linux Systems Administrator
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Technical Support Rep
- Validate an E-Mail Address with PHP, the Right Way
- Senior Perl Developer
- UX Designer
- Speed Up Your Web Site with Varnish
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?