The Ultimate Linux Lunchbox
Okay, we've built the hardware. Now, what is the software?
In years past, it would have been bproc, as found on the Clustermatic site (see Resources). bproc has a problem, however; it cannot support heterogeneous systems. The very nature of bproc, which requires that process migration works, makes the use of different architectures, in a single system, impossible. We're going to have to use something else. We want to continue using our ThinkPad laptop as the front end; there are no StrongARM laptops that we know of. It's clear that we are going to need new software for our minicluster.
Fortunately, the timing for this move is good. As of 2.6.13, there is now support for the Plan 9 protocol in the standard Linux kernel. This module, called 9p (formerly v9fs), supports the Plan 9 resource-sharing protocol, 9p2000. At the same time this code was being ported to the Linux kernel, Vic Zandy of Bell Labs was working with us on xcpu, a Plan 9 version of bproc. One of the key design goals of xcpu was to support heterogeneous systems. The combination, of 9p in the Linux kernel and xcpu servers ported to Linux, has allowed us to build a replacement system for bproc that supports architecture and operating system heterogeneity. Finally, the introduction of new features in 2.6.13 will allow us to remove some of our custom Clustermatic components and improve others. A key new feature is Eric Biederman's kexec system call, which replaces our kmonte system call.
Figure 14 shows a quick outline of the standard bproc boot sequence, as it works on our miniclusters and clusters with thousands of nodes.
The boot sequence, as shown, consists of LinuxBIOS, Linux, Linux network setup, Linux loading another kernel over the network and Linux using the kmonte system call (part of Clustermatic) to boot that second kernel as the working kernel. Why are there two kernels? In Clustermatic systems, we distinguish the OS we use to boot the system from the OS we run during normal operation. This differentiation allows us to move the working kernel forward, while maintaining the boot kernel in Flash.
The new boot sequence is shown in Figure 15. If it looks simpler, well, it is. We no longer have a “boot kernel” and a “working kernel”. The first kernel we boot will, in most cases, be sufficient. Experience shows that we change kernels on our clusters only every 3–6 months or so. There is no need to boot a new kernel each time. Because the 9p protocol and the xcpu service don't change, and the Master node kernel versions are not tightly tied together, we can separate the version requirements of the Master node and the worker node. We could not make this kind of separation with bproc.
The result is that we can weld the StrongARM boards and the Pentium front end (Master) into one tightly coupled cluster. In fact, we can easily mix 32- and 64-bit systems with xcpu. We can get the effect of a bproc cluster, with more modern kernel technology. Figure 16 shows how we are changing Clustermatic components for this new technology.
In this article, we showed how we built the Ultimate Linux Lunchbox, a 16-node cluster with integral Ethernet switch, in a small toolbox. The cluster is built of hardy PC/104 nodes and can easily survive a drop-kick test and possibly even an airport inspection. The system has only three connectors: one Ethernet, one AC plug and one battery connection.
We also introduced the new Clustermatic software, based around the Plan 9-inspired 9p filesystem, now available in 2.6.13. The new software reduces Clustermatic complexity, and the number of kernel modifications are reduced to zero.
Although there was not room to describe this new software in this article, you can watch for its appearance at clustermatic.org; or, alternatively, come see us at SC 2005 in November, where we will have a mixed G5/PowerPC/StrongARM/Pentium cluster running, demonstrating both the new software and the Ultimate Linux Lunchbox.
This research was funded in part by the Mathematical Information and Computer Sciences (MICS) Program of the DOE Office of Science and the Los Alamos Computer Science Institute (ASCI Institutes). Los Alamos National Laboratory is operated by the University of California for the National Nuclear Security Administration of the United States Department of Energy under contract W-7405-ENG-36. Los Alamos, NM 87545 LANL LA-UR-05-6053.
Resources for this article: /article/8533.
Ron Minnich is the team leader of the Cluster Research Team at Los Alamos National Laboratory. He has worked in cluster computing for longer than he would like to think about.
- Smoothwall Express
- Machine Learning Everywhere
- Bash Shell Script: Building a Better March Madness Bracket
- Own Your DNS Data
- Simple Server Hardening
- Understanding OpenStack's Success
- From vs. to + for Microsoft and Linux
- The Weather Outside Is Frightful (Or Is It?)
- Understanding Firewalld in Multi-Zone Configurations
- Ensono M.O.