Clusters for Nothing and Nodes for Free

When the users are away, your company's legacy desktop systems can become a powerful temporary Linux cluster.

For years, our LTSP deployment has been providing multiple X stations to various engineering computers, and we never needed a central application server. The script shown in Listing 6 builds a floppy image for use with all computers. The user simply specifies the network card model.

With this infrastructure, any cluster user can stroll through the buildings with one of those floppies and reboot idle machines into the cluster until sufficient resources are available to run workloads efficiently. For logic simulation, Alex simply adds machines until there are more fast computers in the cluster than slow tests in the suite, so the regression never takes longer than 16 minutes. With that efficiency boost, he rapidly finished the design. Without running mtop, you'd never notice OpenMosix migrating compute-bound processes into the cluster. Meanwhile, others are using the network for different projects.

Large Off-Peak Cluster

Quantum Magnetics has about 100 employees, so our cluster is limited to around 100 nodes, as a few people have more than one computer. We're setting things up so that machines spend nights in the cluster and days as normal user workstations. They reboot at least twice every day and check a configuration directory to decide whether to boot from the network or from the hard drive.

The BIOS must be configured to try the PXE boot before the hard drive. The DHCP servers distinguish between EtherBoot and PXE boot requests, with the latter receiving the boot filename for PXELINUX. There are two directories of configuration files, one for day and one for evening, and a small cron job to switch between them. The daytime boot chains to the master boot record on the hard drive, and the evening boot chains to the PXE version of EtherBoot.

The LTSP configuration file indicates which machines have to reboot on weekday mornings and causes the ctrlaltdel script to run. If a user comes to work early, simply pressing Ctrl-Alt-Del brings the machine back into daytime mode as soon as possible.

Remote Windows administration is used to force workstations to log off after inactivity in the evening and then reboot once. If either of the two network boot stages fail, the machine starts Windows and does not join the cluster.

Long-Term Use

Once your on-demand cluster is running smoothly, resist the temptation to increase it by purchasing a lot of desktop computers you don't otherwise need. The use of LTSP with desktop computers is cost effective only because you already paid for them. There is no financial outlay to acquire them, install them or maintain them when any of their components fail. Dedicated multiprocessor rackmount computers are easily the cheapest way to add processing power to a cluster. By omitting the unnecessary peripherals, they also save money, power, cooling and some failures.

OpenMosix or Mosix offer a quick and easy way to get cluster benefits, but the kernel is making migration decisions in real time. It is inherently less efficient than using explicit workload management with processes dedicated to individual nodes and communicating using MPI. Because you can support both Mosix and MPI within the same cluster, you may want to add job control and MPI libraries to the LTSP client filesystem. Applications that are cluster-aware take advantage of MPI and achieve the ultimate performance available. The other applications always gain partial benefits from Mosix.

On a dual-MPI/Mosix cluster, users have the incentive to migrate to MPI applications. The load balancing algorithms of Mosix always give priority to a local MPI process over a migrated Mosix process, so cluster-unaware applications run more slowly. We haven't started using MPI yet, because none of our critical engineering applications would benefit from it enough to justify the effort needed to establish it.

______________________

White Paper
Fabric-Based Computing Enables Optimized Hyperscale Data Centers

Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.

Learn More

Sponsored by AMD

White Paper
Red Hat White Paper: Using an Open Source Framework to Catch the Bad Guy

Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6

Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.

Learn more about catching the bad guy in this free white paper.

Learn More

Sponsored by DLT Solutions