I'm Not Going to Pay a Lot for This Supercomputer!
As much as we might like to own a supercomputer, high cost is still a deterrent. In a market with almost no economy of scale, buyers find themselves relying on the vendors for specialty hardware, specialty software and expensive support contracts while hoping that the vendors don't join the long list of bankrupt former supercomputer vendors. The limited number of sale opportunities force vendors to try satisfying all customers, with the usual result that no one is really happy. There is simply no way to provide highly specialized software (such as a parallelizing compiler) and simultaneously keep costs out of the stratosphere.
On the other end of the market, however, sits the generic buyer. More correctly, tens of millions of generic buyers, all spending vast sums for fundamentally simple machines with fundamentally simple parts. What the vendors lose in profit margin, they make up for in volume. The result? Commodity computer components are increasingly faster, cheaper and smaller. It is now possible to take these off-the-shelf parts and assemble machines which run neck-and-neck with the “big boys” of supercomputing, and in some instances, surpass them.
Intel's x86 series of processors, especially the Pentium and Pentium Pro, offer excellent floating-point performance at ever-increasing clock speeds. The recently released Pentium II has a peak clock speed of 300 MHz, while Digital's best Alpha processors compute merrily along at 500 MHz and higher.
The PCI bus allows the processors to communicate with peripherals at rates in excess of 100MB/sec. Because it is a processor-independent bus, undertaking processor upgrades (e.g., from the Pentium Pros to 500MHz DEC Alphas) requires replacing only the processors and motherboards. Further, parts replaced by an upgrade can be expected to have a significant resale value.
The development of Fast Ethernet technology makes possible point-to-point communication in excess of 10MB/sec. Switches which allow multiple machines to use this bandwidth in full are readily available, which gives the Beowulf-class (see below) machine a bandwidth and latency which rivals the larger IBM SP-2 and the Thinking Machines CM-5. While the Beowulf machines don't yet scale easily to hundreds of processors, their performance in smaller networks of 16 or 32 processors is outstanding.
The Linux operating system is robust, largely POSIX-compliant and available to varying degrees of completeness for Intel x86, DEC Alpha and PowerPC microprocessors. Thanks to the untiring efforts of its legions of hackers, auxiliary hardware (network and disk drivers) is supported almost as soon it becomes available and the occasional bug is corrected when found, often the same day. GNU's compilers and debuggers coupled with free message-passing implementations make it possible to use Linux boxes for parallel programming and execution without spending money on software.
The Beowulf Project studies the advantages of using interconnected PCs built from mass-market components and running free software. Rather than raw computational power, the quantities of interest derive from the use of these mass-market components: performance/price, performance/processor and so on. They provide an informal “nonstandard” by loosely defining a “Beowulf-class” machine. Minimal requirements are:
16 motherboards with Intel x86 processors or equivalent
256MB of DRAM, 16MB per processor board
16 hard disk drives and controllers, one per processor board
2 Ethernets (10baseT or 10base2) and controllers, 2 per processor
2 high resolution monitors with video controllers and 1 keyboard
The Beowulf-class idea is not so much to define a specific system than to provide a rough guideline by which component improvement and cross-platform Linux ports can be compared. Several Beowulf-class machines are in use throughout the United States, including Loki in the Los Alamos National Laboratory's Theoretical Astrophysics group and Hyglac at Caltech's Center for Advanced Computing Research.