Natural Selection in a Linux Universe
Astronomers worry about how stars work. Our current models describe stars as huge, hot gasballs, bloated and made luminous by a fusion furnace deep inside that burns hydrogen into helium and releases energy in the process. A kind of internal thermostat keeps them stable, so our planet enjoys a comfortable environment in its orbit around our star, the sun. In about 6,000 million years or so, all available fuel will be burned up, and as the fuel gets low, the sun will bloat, then shrink until it is 100 times smaller than it is now, becoming a white dwarf star. Written inside, in the ashes of the furnace, will be its nuclear history.
We have pieced together this story by looking at many different stars, which last much longer than we do, but we cannot see inside any of them. Stars are very luminous yet thoroughly opaque. Geologists have built up a detailed picture of the earth's interior, even though it is opaque too; they do this by watching as compression waves from earthquakes rattle around inside and make their way back to the surface: seismology. By a very fortunate circumstance, we have found that some white dwarf stars vibrate internally with something akin to earthquakes, all the time. Their rapid changes in brightness tell us what is going on inside: asteroseismology.
To take advantage of this cosmic bonanza, we build computer models of the stars, with adjustable parameters that reflect, one-to-one, the physics going on inside. We must “vibrate” our model and tweak its parameters until the model behaves like a real star. We then believe that the parameters in our model tell us about the physics inside the white dwarf star. We can then start to read the history written there.
The basic idea is nifty, but the practice is a bit complicated. The models have many parameters, not all independent of one another, and we are not completely sure we have all the physics right. To make sure the set of model parameters we use is the best fit to the observed behavior and the only reasonable one, we have to explore a very large, multi-dimensional parameter space—far too large and complex to examine in exhaustive detail. No existing computer could handle it. There is a way though: we populate our huge parameter space at random with models whose parameters cover the whole shebang. Then we breed them together, preferentially allowing those which fit the observations fairly well to survive into later generations. This survival of the fittest is done with a “genetic algorithm” that mimics, in a crude but effective way, the process of natural selection proposed by Charles Darwin.
Even using this trickery, a lot of computing is required, so we built a massive parallel system to cut the runtime to hours instead of weeks. Most of the model calculations are done in floating-point arithmetic, so we measure performance in flops, the number of floating-point operations per second. Our assembled system, called a metacomputer, is capable of more than two gigaflops—2,000 million floating-point operations per second—not bad for an assembly of Linux boxes.
Our strategy in designing this system was minimalist; keep each computer node as cheap and simple as possible, consistent with doing our job and getting the maximum amount of computing for the buck. Our budget is fairly limited. CPU cost is not a linear function of speed, so you pay a great deal more per megaflop for the fastest CPU on the market. Older CPUs are cheaper, but require more boxes and supporting electronics to house them for the same final performance. We watched the price drops with avid interest and jumped just after the 300MHz Intel P-II dropped below $300. We could afford a good master control computer and 32 computing nodes with our $22,000 budget.
Some time after we settled on the design, we became aware of the existence of Beowulf machines through an article in Linux Journal (see Resources)—also parallel systems running Linux, but with faster Ethernet connections and more storage than our problem requires. They are much more general purpose than the system we built, so they can handle many problems ours cannot. They cost more too.
Our master computer is a Pentium-II 333 MHz system with 128MB SDRAM and two 8.4GB hard disks. It has two NE-2000 compatible network cards, each connected to 16 nodes using a simple 10base-2 coaxial network. We assembled the nodes from components obtained at a local discount computer outlet. Each has a Pentium-II 300 MHz processor housed in an ATX tower case with 32MB SDRAM and an NE-2000-compatible network card. We used inexpensive 32KB EPROMs, programmed with a BP Microsystems EP-1 using a ROM image from Gero Kuhlmann's Netboot package, allowing each node to boot from the network.
Configuring the software was not much more complicated than setting up a diskless Linux box (see Robert Nemkin's Diskless Linux Mini-HOWTO). The main difference was that we minimized network traffic by giving each node an identical, independent file system rather than mounting a shared network file system. Since the nodes had no hard disks, we needed to create a self-contained file system that could be mounted in a modest fraction of the 32MB RAM.
To create this root file system, we used Tom Fawcett's YARD package (http://www.croftj.net/~fawcett/yard/). Although Yard was designed to make rescue disks, it was also well-suited for our needs. We included in the file system a trimmed-down, execute-only distribution of the PVM (parallel virtual machine) software developed at Oak Ridge National Laboratory (http://www.epm.ornl.gov/pvm/). PVM allows code to be run on the system in parallel by starting a daemon on each node and using a library of message-passing routines to coordinate the tasks from the master computer.
We configured the master computer to be a BOOTP/TFTP server, allowing each node to download the boot image—essentially a concatenation of a kernel image and a compressed root file system. We used the Netboot package (http://www.han.de/~gero/netboot/) to create this boot image using the root file system created by YARD and a small kernel image custom-compiled for the nodes.