Building a Linux-Based High-Performance Compute Cluster

The Rocks clustering package from the University of California at San Diego makes it easy to build and maintain a high-performance compute cluster with off-the-shelf hardware.

You have an application running on a relatively new dual-core workstation. Unfortunately, management wants it either to complete faster or be able to take on a larger dataset in the same time as it runs now. You do a bit of investigating and find that both SMP and cluster versions of the application are available. You are using the SMP version on the workstation. You could speed things up if you could run on a quad core (or more) workstation, but the boss is not too receptive to the expenditure in the current economic climate. But wait, you do have a pile of 32 single-socket servers that were replaced earlier in the year. They're only single core, but 32 of them should have more capacity than the dual-core workstation, if you can just find a way to get them all to play together—that would be a cluster.

So, what is a cluster? Here's one accepted definition: a cluster is a group of computers all working together on the same problem. To accomplish this, the machines in the cluster must be appropriately interconnected (a network) and trust each other.

It is possible to configure the networking and security manually, but there are easier ways to accomplish this using any one of a number of cluster provisioning and management systems. At the moment, one of the more popular packages is the Rocks package maintained by a team at the University of California, San Diego, under a grant from the National Science Foundation.

Rocks is termed a cluster provisioning, management and maintenance package. It helps you set up the cluster in the first place (from bare metal); it provides the tools to run parallel programs, and it provides the tools to maintain and extend the cluster after it is created.

The package is delivered as a series of .iso images that you burn onto a series of CDs or DVDs. You then boot the machine that will become the head node from the appropriate DVD or CD, and the installation routine guides you from there. After asking a minimum number of questions in an interactive phase, the installation program builds the head node. Upon reboot, you invoke a single routine (insert-ethers) to add the rest of the machines as compute nodes. To add a compute node, you simply network boot it, and it will be added to the cluster, loaded and configured automatically. After the last node is complete, you have a functional cluster, ready to execute parallel applications.

So, with all of this in mind, let's build a cluster with those otherwise unloved machines.

Step 1. Hardware Setup

The first item on the agenda is setting up the hardware. The overall idea is to have a set of connected computers. Ideally, the machines in the cluster should be as identical as possible, so no single machine or group of machines will be the weak link in any parallel computation. The same homogeneity should apply to the network, because most parallel computation relies on continuous communication between all of the nodes within the cluster.

Find a spot to set up your 32 servers, and make sure you have enough power and cooling to support them. As you connect all of the servers to power, label both ends of each power cord so you can keep track of what is connected to each power strip in the rack.

Because you are starting with a clean sheet, now is a good time to update and configure the BIOS on each system. Set the BIOS clock to the current time as closely as practical (plus or minus five minutes is a good goal). Most clustering packages keep the BIOS clocks synchronized during operation, but only if the clock is reasonably close to the correct time at the beginning.

Because the machines are used, it's prudent to wipe all the disks before loading the cluster software. There are many ways to accomplish this. One fairly thorough method is to use DBAN (Darik's Boot and Nuke). This self-contained application can perform several disk wipe techniques, including two that have some level of Department of Defense approval.

Remember, the goal here is to make all the machines in the cluster as identical as possible. But, this is a goal, not a hard and fast requirement. Heterogeneous clusters will work, but you may need to be careful as to how you deploy workloads on the machines to get the best performance.

Step 2. The Network

Now that you have all the compute nodes configured and in the rack, it's time to set up the communications network. Figure 1 shows a typical networking setup for a simple compute cluster. In this configuration, the Ethernet fabric most likely would be used for administrative purposes, while the InfiniBand fabric would carry the compute traffic. If you don't have InfiniBand hardware available, you can just ignore the bottom section of the diagram. The Ethernet fabric can carry both the administrative and compute traffic.

Figure 1. Network Setup for a Compute Cluster

The best Ethernet network configuration for your cluster would be a single 48-port switch. If a switch like that is not available, you always can resort to a set of smaller federated switches forming a full fat tree network for the cluster. Like the compute nodes themselves, the network should be as uniform as possible.

Plan all the cable runs, remembering that Ethernet cables have a nonzero cross section. Before you install them, test each cable. There is nothing as aggravating as finding that a cable is bad after it has been tied into the rack in a dozen places. Once again, label both ends of each cable to make troubleshooting simpler if it is necessary.