Clusters for Nothing and Nodes for Free

When the users are away, your company's legacy desktop systems can become a powerful temporary Linux cluster.
Protect History

The most valuable part of the work is the data and all the intermediate states of the work in progress, because any damage here sets you back days even if you have backups and external version control checkpoints. A RAID array of 1 or 5 is the usual protection. One computer, not one of the fastest ones, should have at least two hard drives on distinct controllers. It is worth making sure that each drive has a small swap partition so the kernel can use all the swaps and do some load balancing.

Turn on the kernel-space NFS server and configure /etc/exports from the point of view of securing the data storage from damage. When the NFS is under heavy load, user-space programs have to be swapped to make space for additional disk cache. Consider having a runlevel that could be deferred to disable all the services that wake up periodically for minor purposes.

We're using an existing dual-Athlon MP machine with over a terabyte of storage and running Debian stable as our NFS server. The system is overkill for the cluster; we originally built it to archive field test data and then stream the data to multiple clients for analysis. No X server is used, because the cooling fans make so much noise that nobody wants the machine sitting next to his or her desk.

Without a Cluster

Using make batch2 on a dual-processor machine reduced our runtime by about 40%, with one of the processors being idle near the end of the run. The total runtime was between four and six hours of clock time. This can be improved, even without a cluster, by distributing the work across many machines using OpenSSH with public key authentication. The Linux Journal article (“Eleven SSH Tricks” by Daniel R. Allen, August 2003) explained how to configure this powerful package to avoid endless streams of password prompts while simultaneously enhancing network security.

The machines sharing the work usually come to have different performance capabilities. It is important to match the relative runtimes of the various tests against individual processor speeds, remembering SMP, so all of the tests finish at about the same time. We found it best to optimize the mapping manually in a script like the one shown in Listing 2.

By using SSH to two dual-Athlon MP machines, one Pentium III laptop and five Pentium II desktops, we reduced runtime to a fairly consistent two hours.

Initial Cluster

If everyone is running the same version of the same distribution, it probably is sufficient to install the prepackaged binaries of OpenMosix. Thereby, you have the workload migration available without any effort. Always use the autoconfiguration option instead of specifying the list of nodes manually, because the cluster grows in later stages.

We use several different distributions in the office, so we downloaded a pristine 2.4.20 kernel tarball, the matching OpenMosix patch and the source of user-space tools to the NFS fileserver. After making careful notes of the configuration settings to keep all the machines in step, we followed the instructions on the OpenMosix Web site. Because it takes our time and effort to recompile and reinstall kernels, we modified only four computers needed to cluster seven processors. This is slightly less capable than the ten processors achieved through SSH. Even so, the worst-case runtime stayed almost identical, because the migration did the load balancing slightly better than our hand-optimized script could achieve. Because Alex could use make -j and let OpenMosix assign the work, all incremental workloads completed faster and did not need the full two hours.

OpenMosix tries to be fair and have all programs run at the same speed by putting more work on the faster computers. This is not optimal for the logic simulation workload, however, as we usually know the relative runtimes. In this case, a short script (not included here) helpfully monitors the contents of /proc. The script periodically looks for process pairs with a big ratio in their expected runtimes but whose node assignments are not providing a corresponding execution speed ratio. The script uses its knowledge of prior runs to request a migration to gain a long-term benefit hidden from OpenMosix. Such a script is not needed if, for your application, the runtimes of all processes are similar.

______________________

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState