Linux Clusters at NIST
The National Institute of Standards and Technology (NIST) is experimenting with clusters based on commodity personal computers and local area network technology. The purpose of the experimental phase of the cluster project is to determine the viability of using commodity clusters for some of the NIST parallel computing workload. In addition to a Cray C90 running sequential codes, many parallel jobs are run on IBM SP2, SGI Origin 2000, SGI Onyx and Convex supercomputers.
Building clusters to run parallel jobs is done for several reasons. Low initial cost is one, although it is not clear how the long-term costs associated with clusters compare with those of traditionally packaged systems supported by manufacturers. Another reason is availability of systems. Having a slower system available without needing to wait is sometimes better than waiting a long time for access to a faster system. A third reason is that free implementations of the parallel virtual machine (PVM) and message passing interface (MPI) environments are available for Linux. Many NIST parallel applications rely on PVM or MPI.
Figure 1 shows a diagram of the current cluster. Currently, there are 48 machines (nodes) in the cluster; 32 nodes have a 200 MHz Pentium-Pro microprocessor, and 16 nodes have a 333 MHz Pentium II microprocessor. The Pentium-Pro machines are built around the Intel VS440FX (Venus) motherboard, which uses a 33 MHz PCI bus and has four SIMM memory slots. The Pentium II machines use a motherboard based on the Intel 440LX chip set, also with a 33 MHz PCI bus, and 3 DIMM slots for memory. All of the nodes are configured with a single 2.1GB hard disk supporting 512MB of swap space and the file systems. Sixteen nodes have 128MB of memory and 32 nodes have 256MB, for an aggregate total of 10GB of memory.
Sixteen of the cluster machines are connected with both Fast-Ethernet and ATM. The remaining 32 machines are connected with Fast-Ethernet only. Therefore, the cluster can be configured with up to 48 Fast-Ethernet nodes or up to 16 ATM nodes, depending on the requirements of the job to be run. One other Pentium-Pro machine is used as an administrative front end to the cluster. This machine has both Fast-Ethernet and ATM interfaces.
The ATM interface cards are Efficient Networks ENI-155p with 512KB of memory onboard. These nodes are connected to a Fore ASX-1000 switch over OC-3 (155Mbps), using multi-mode fiber. For Fast-Ethernet, we use SMC EtherPower 10/100 and Intel EtherExpress 100+ cards connected to one Ethernet switch: an N-Base MegaSwitch 5000 with 60 Fast-Ethernet ports. The cabling to the switch is done with Category 5 twisted-pair cable. The Ethernet switch is connected to the intra-NIST network via a 100Mbps uplink to a router. Both the ATM and Fast-Ethernet interfaces are configured into different subnets, allowing us to monitor the network traffic independently. There is very little background traffic on the cluster subnets, since the nodes are used only for parallel application programs. Keeping non-computational network traffic to a minimum is important when evaluating the cluster network.
The cluster nodes have been augmented with the NIST-developed MultiKron performance measurement instrumentation (see Resources 6 and 7). The MultiKron PCI board is equipped with a MultiKron VLSI chip, a high precision clock and 16MB of memory. These features allow for precise interval measurement and storage of trace data with little perturbation. Another advantage of MultiKron is that the clocks on several boards (up to 16 at present) can be time-synchronized, allowing for tracing of events across the cluster with 25-nanosecond resolution. This type of measurement is important for precise tracing of network events.
The 32 Fast-Ethernet-only cluster nodes are running Linux kernel version 2.0.29. We've found this version to be the most stable for our configuration. However, because support for the Intel EtherExpress 100+ is not in this kernel release, the device driver is built as a module. The 16 ATM/Ethernet nodes use Linux kernel 2.1.79, as this release is required for the ATM software version in use.
Development and support for the ATM software comes from the Linux-ATM project run by Werner Almesberger at the Swiss Federal Institute of Technology (EPFL). (See Resources 2.) We are currently running version 0.34 of this software, having started with version 0.26.
We installed Local Area Multicomputer (LAM) (an implementation of MPI) version 6.1 and Parallel Virtual Machine (PVM) version 3.10 on the cluster in order to run our benchmarks in addition to the NIST parallel jobs.
We also developed a device driver to allow for user-mode programs to control the performance counters present in the Intel Pentium-Pro and Pentium II processors. Two performance counters are present in the Intel Pentium-Pro architecture, along with a timestamp counter. Each counter can be configured to count one of several events, such as cache fetches and instruction executions. (See Resources 3.) The device driver is required because writing to the counter control registers (and the counters themselves) can be done only by the Linux kernel. User-mode programs can directly read counter values without incurring the overhead of a kernel system call to the device driver.
Another tool we use on the cluster is S-Check (see Resources 5), developed by our group. S-Check is a highly automated sensitivity analysis tool for programs. It predicts how refinements in parts of a program will affect performance by making local changes in code efficiencies and correlating these against overall program performance.
We have written many small test kernels to evaluate the performance of communication within the cluster. We have versions of the test kernels that communicate at the raw socket, IP and LAM/PVM library levels. These small kernels are useful in evaluating the overhead of the different communication software levels. By using the MultiKron toolkit, the kernels obtain very precise measurements of network performance.
Free DevOps eBooks, Videos, and more!
Regardless of where you are in your DevOps process, Linux Journal can help!
We offer here the DEFINITIVE DevOps for Dummies, a mobile Application Development Primer, and advice & help from the expert sources like:
- Linux Journal
- High-Availability Storage with HA-LVM
- DNSMasq, the Pint-Sized Super Dæmon!
- March 2015 Issue of Linux Journal: System Administration
- Localhost DNS Cache
- Real-Time Rogue Wireless Access Point Detection with the Raspberry Pi
- Days Between Dates: the Counting
- PostgreSQL, the NoSQL Database
- The Usability of GNOME
- Linux for Astronomers
- You're the Boss with UBOS