Clusters for Nothing and Nodes for Free

When the users are away, your company's legacy desktop systems can become a powerful temporary Linux cluster.
Those Old Machines

Usually, plenty of spare older computers can be found hiding in corners. Put an X server on one of them that is configured to be a terminal into the xdm service on the fast computers. With this machine, you can shut down the X servers on the fast computers and release their processor and memory resources back into the important workload. Alex's desktop computer, a 400MHz Pentium II, already had its X server indirecting over xdm's chooser. David's work keeps him roaming the building and relying on VNC, so he already was using Xvnc. Only Hoke needed to make minor changes to configuration files.

Next, install LTSP on one computer and set up all the other old computers to use diskless boots to become terminals too. Doing so eliminates the administration of all those operating systems. You now should have enough terminal stations that all your team members are using terminals, and all the fast compute nodes can stay in the stripped runlevel and be as efficient as possible. It doesn't take long to get those two features working, and an excellent time to work on this is whenever you're waiting on the running jobs.

There is no need to get the DHCP and TFTP components of LTSP working. Put the kernel on a floppy, together with SysLinux configured to trigger the non-boot DHCP, and mount the NFS root filesystem. Then, use that one floppy to do the one-time boot of the terminals. Reboots are needed infrequently, so the slowness of the floppy is fine.

Coworkers

Once the cluster and LTSP are both functional, we simply combine them. The short script shown in Listing 3 uses the NBI tools to put the patched kernel into /ltsp/i386/boot. Our DHCP server's filename parameter is a soft link, so we can change the LTSP kernel rapidly while testing upgrades. After copying the user-space tools into the client filesystem and renaming the init script as rc.openmosix, we add the few lines in Listing 4 to the LTSP startup script. Slower computers have MOSIX=N in the LTSP configuration file because they would not contribute much performance to the cluster.

One line in /ltsp/i386/etc/inittab:

ca:12345:ctrlaltdel:/sbin/ctrlaltdel

calls a copy of Debian's shutdown binary using the script shown in Listing 5. This ensures that Ctrl-Alt-Del forces a clean disconnect from the cluster before rebooting.

Once you are confident that the LTSP-OpenMosix kernel is stable and not going to be changed, you can hand out floppies with the new kernel. The LTSP users won't see a difference, but your compute workload will.

If you would like to maintain the option of changing the kernel without having to hunt around the company to find all the old floppies, now is a good time to get the DHCP network boot working. The LTSP documentation describes how to configure Linux or UNIX servers, but our implementation was running on Microsoft Windows. David, who administers our Windows-based DNS and DHCP servers, set up Netboot in DHCP (Figure 1).

Figure 1. The three scope entries needed on a Windows DHCP server. Notice that the root path has the trailing slash workaround.

Microsoft DHCP appends a null to the NFSROOT, as discussed in LTSP mailing lists, so you need a soft link:

ln -s /ltsp/i386 /ltsp/i386/000

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix