I'm Not Going to Pay a Lot for This Supercomputer!

Los Alamos National Laboratory and Caltech obtain gigaflops performance on parallel Linux machines running free software and built from commodity parts costing less than $55,000 each (in September 1996). Now, you can probably build a similar machine for about $25,000.
Loki at LANL

Loki is a 16-node parallel machine with 2GB RAM and 50GB disk space. Most of the components were obtained from Atipa International (www.atipa.com). Each node is essentially a Pentium Pro computer optimized for number crunching and communication:

  • (1)Intel Pentium Pro 200MHz CPU with 256K integrated L2 cache

  • (1) Intel VS440FX (Venus) motherboard, 82440FX (Natoma) chip set

  • (4) 8x36 60ns parity SIMMS (128MB per node)

  • (1) Quantum Fireball 3240MB IDE Hard Drive

  • (1) Cogent EM400 TX PCI Quartet Fast Ethernet Adapter

  • (1) SMC EtherPower 10/100 Fast Ethernet PCI Network Card

  • (1) S3 Trio-64 1MB PCI Video Card

The list purchase price of Loki's parts in September 1996 was just over $51,000.

The nodes are connected to one another through the four-port Quartet adapters into a fourth-degree hypercube. Each node is also connected via the SMC adapter to one of two eight-port 3Com SuperStack II Switch 3000 TX 8-port Fast Ethernet switches, which serve the dual purpose of bypassing multi-hop routes and providing a direct connection to the system's front end, a dream machine with the following components:

  • (2) Intel Pentium Pro 200MHz CPU with 256K integrated L2 cache

  • (1) ASUS P/I-P65UP5 dual CPU motherboard, Natoma chip set

  • (8) 16x36 60ns parity SIMMS (512MB)

  • (6) Quantum Atlas 4.3GB UltraSCSI Hard Drive

  • (1) Adaptec 2940UW PCI Fast Wide SCSI Controller

  • (1) Cogent EM400 TX PCI Quartet Fast Ethernet Adapter

  • (1) SMC EtherPower 10/100 Fast Ethernet PCI Network Card

  • (1) Matrox Millennium 4MB PCI Video Card

  • (1) 21 inch Nokia 445X Monitor

  • (1) Keyboard, Mouse, Floppy Drive

  • (1) Toshiba 8x IDE CD-ROM

  • (1) HP C1533A DAT DDS-2 4GB Tape Drive

  • (1) Quantum DLT 2000XT 15GB Tape Drive

It is also possible for the nodes to communicate exclusively through their SMC-SuperStack connections as a fast, switched array topology. At Supercomputing '96, Loki was connected to Caltech's Hyglac and the two were run as a single fast switched machine.

Hyglac at Caltech

Like Loki, Hyglac is a 16-node Pentium Pro computer with 2GB RAM. At the time of its construction, it had 40GB disk space, though that has since been doubled by adding a second hard drive of the type listed below to each node.

  • (1) Intel Pentium Pro 200 MHz CPU with 256K integrated L2 cache

  • (1) Intel VS440FX (Venus) motherboard, 82440FX (Natoma) chip set

  • (4) 8x32 60ns EDO SIMMS (128MB per node)

  • (1) Western Digital 2.52GB IDE Hard Drive

  • (1) D-Link DFE-500 TX 100MB Fast Ethernet PCI Card

  • (1) VGS-16 512K ISA Video Card

Each node is connected to a 16-way Bay Networks 28115 Fast Ethernet Switch in a fast switched topology. Video output is directed to a single monitor through switches; the node which is directly connected to the monitor also supports a second Ethernet card and a floppy drive. The list purchase price of Hyglac in September 1996 was just over $48,500. Most of the components have since decreased in price by about 50%, and the highest single-cost item (a 16-port Fast Ethernet Switch) can now be obtained for less than $2500!

Software on Loki and Hyglac

Both Loki and Hyglac run Red Hat Linux on all nodes, with GNU's gcc 2.7.2 as the compiler.

The 200MHz Pentium Pros that drive both systems supply a real-time clock with a 5 nanosecond tick, providing precise timing for message passing. More advanced timing and counting routines are available as well, so that profiling data like cache hits and misses are directly supported. A relatively simple interface to the hardware performance monitoring counters on the CPU has been developed at LANL called perfmon, which is available at the Loki URL listed in the Resources.

Figure 3. Parallel Linux Cluster Logo

Internode communication is accomplished via the Message Passing Interface (MPI). While multiple implementations of MPI are freely available, none was specifically written to take advantage of a Fast Ethernet-based system and, as usual, maximum portability leads to a decidedly less than maximum efficiency. Accordingly, a minimal implementation was written from scratch which incorporated the 20 or so most common and basic MPI functions. This specialized MPI library runs the treecode discussed in the next section as well as the NAS parallel benchmarks for Version 2 MPI, while nearly doubling the message bandwidth obtained from the LAM (Ohio State's version of MPI) and MPICH (from Argonne National Laboratory and Mississippi State University) implementations.