Remote Linux Explained
Remote Linux refers to Linux workstations or nodes that do not boot the Linux kernel from local media, but instead receive their startup instructions over a network, typically Ethernet. Due to the ease of configuring both the Linux kernel and the operating system itself, Linux is being customized to work in many disparate environments, from web serving, to cluster computing and X servers.
The first question you might be asking yourself is why start Linux remotely? After all, installing Linux locally is a matter of sticking in the CD, answering a few questions and going out for a double latte while the workstation installs. This is true for the typical single-workstation installation; however, once you start to manage and install a lot of workstations, particularly in a cluster or server-farm setting, local media becomes less practical. With the advent of dense servers from companies like RLX Technologies, Inc., booting remotely becomes a necessity, as dense servers do not provide diskette or CD-ROM drives on the nodes. Dense servers expect to be brought up over the local fast Ethernet connection and administered remotely. The main advantages of remote Linux are:
Centralized, hands-off administration: while many installations do have someone on site that can jockey CDs and diskettes 24/7, there are also many hands-off sites (sites where no one is physically present at the site for long periods of time). At these sites, when a programmer is working remotely and needs to boot a node using a special image that's on local media, he or she is out of luck until someone is there to load the correct media.
Dense server solution: since the trend is toward centralized remote administration for clusters and server farms, CD and diskette drives become rather anachronistic. The very presence of local media on the nodes means that the nodes must take on a certain form factor, thereby increasing the minimum size of the nodes. For higher density node packaging, CD-ROM and diskette drives are being phased out in certain segments of the industry.
Versatility: sometimes one can fix a problem with a filesystem that prevents the node from coming up from local media. For example, you can run fsck on a local hard drive on a remotely booted machine in order to fix a filesystem problem.
Cost and security: why pay for media you don't need? Some companies are selling workstations without hard drives and other local media that are intended for use as secure terminal servers. Secure, in this sense, means that if employees do not have access to local media on their workstations, it is more difficult to capture sensitive data.
Helps eliminate version skew: in the case where all workstations are booted remotely from a single kernel image, it eliminates the problem of updating local media for kernel patches or enhancements. You can update the single remote kernel image once, instead of propagating the change to a set of workstation hard drives or, worse, to their local boot diskettes.
The remote boot process mimics the local boot process but with a few important distinctions. From a logical perspective, without talking about the services that perform these tasks, this is basically what happens during the network boot process:
The node is powered up or reset and conditioned to boot from the network.
The node broadcasts its unique Ethernet MAC address, looking for a server.
The server, previously conditioned to listen for specific MAC addresses, responds with the correct IP address for the node. Alternately, the server responds to any broadcast on its physical network with IP information from a designated range of IP addresses.
The node receives its IP information and configures its Ethernet adaptor for TCP/IP communications.
The node requests a kernel over its newly configured adaptor.
The server responds by sending a network loader to the client, which will load the network boot kernel.
The network boot loader mounts the root filesystem as read-only.
The network loader reads the network boot kernel sent from the server into local memory and transfers control to it.
The kernel mounts root as read/write, mounts other filesystems and starts the init process.
Init brings up the customized Linux services for the node, and it comes up completely.
From this description, we see that the booting node has several dependencies on the server: a network boot kernel, a root filesystem and a method of transporting the kernel and IP information from the server to the node.
To get a node to boot correctly remotely, the server and client node must cooperate in several well-defined ways. There are several basic requirements for setting up nodes for remote booting:
A well-stocked server: the server must have the proper services running (we'll talk about those later) that can provide information required by the remotely booting node.
A method of remote power control: to start the node's boot sequence, you have to be able to power up or reset the node. You really cannot rely on the operating system on the node to be present at this time—after all, sometimes you need to reset the node because the operating system crashed.
A network: the nodes all have to have a route, direct or indirect via a gateway, to the server. We'll only talk about Ethernet networks here.
A network-bootable Linux kernel: the kernel can be either compressed or uncompressed, tagged or untagged. Tagged kernels, discussed later in this article, are used by the Etherboot solution. Also, see the Etherboot web site in the Resources section for more information.
A network loader: a method of reading the network boot kernel from the server and placing it correctly in the memory of the node.
A filesystem: in the good old days, the filesystem was served over the network via NFS. With solutions like SYSLINUX, you actually can provide the entire filesystem via a RAM disk, sent over the network with the kernel.
Now that we have a basic idea of the flow of the remote booting process and a general list of the client's requirements for the server, let's discuss in practical terms the options for providing all these services.
Remote power control is a small subset of the available possible remote hardware functions. Various manufacturers provide specialized hardware and software with their offerings that provide robust remote hardware control features, such as monitoring temperature, fan, power, power controls, BIOS updating, alerts and so on. The only remote hardware control features we really need in order to boot the nodes remotely are the ability to power-on and power-off, although having a reset feature is handy as well.
To force a network boot, the boot order on the workstation probably has to be modified. Typically, the boot order is something like diskette, CD-ROM and then hard drive. Most often, booting over the network is either not on the boot list at all or at the very bottom of the list, after the last of the local media. Since most workstations are shipped with some viable operating system already installed on the hard drive, a reset or power-on of the node will boot it from its local hard drive.
To do true network booting, you must have an Ethernet adaptor that is PXE-compliant. PXE is the Preboot eXecution Environment, part of Intel's Wired for Management (WfM) initiative. PXE-compliant means that the Ethernet adaptor is able to load and execute a network loader program from a server prior to bringing over the kernel itself. Both the BIOS on the node and the firmware on the Ethernet adaptor must cooperate to support PXE booting. A PXE-compliant system is capable of broadcasting the adaptor MAC address, receiving a response from the server, configuring the adaptor for TCP/IP, receiving a network loader over the network and transferring control to it.
Although this is an article about remote Linux, and a diskette is a local media, there is a hybrid of remote booting that is so important it must be mentioned. Since many Ethernet adaptor/node BIOS combinations are not PXE-compliant, an open-source solution has sprung up: Etherboot. Etherboot provides a method of creating a boot diskette that contains a simple loader and an Ethernet device driver. Set the boot list to disk, and when the client is booted the Etherboot diskette takes over loading the driver into memory and broadcasting the MAC address, looking for a server. In the PXE case, the server is conditioned to respond with a network boot loader; in the Etherboot case, the server must respond with a tagged network boot kernel. A kernel is tagged by running the mknbi command against the kernel. (See etherboot.sourceforge.net/doc/html/mknbi.html for further information on mknbi.) The point of network booting the node is to get the kernel into local memory. Regardless of whether you choose the PXE scenario or the Etherboot scenario, the network boot path quickly coalesces—the kernel is in memory and control is passed to it.
When the client boots over the network, whether using PXE or from diskette, it will broadcast its MAC address over the LAN, looking for a server that is conditioned to provide the client's IP information. This is so the client can configure its Ethernet adaptor with the correct IP information and continue the rest of the boot conversation using TCP/IP. There are several methods of providing the IP information to a broadcasting node: RARP, BOOTP and DHCP.
RARP (Reverse Address Resolution Protocol) is the method by which an adaptor's unique 48-bit Ethernet address (its MAC) is associated with an IP address. When a client attempts to boot remotely, it will broadcast its MAC address to all workstations on the physical network. One or more of the workstations will be running the RARPD dæmon, which reads /etc/ethers to make the association between the 48-bit Ethernet address and an IP address and responds to the broadcasting client with its shiny new IP address. After receiving an IP address, the client should initiate a TFTP (Trivial File Transfer Protocol) request to get its image (more about that later). The biggest drawbacks to RARP are that it works only on the local physical network (it's not rebroadcast), and it supplies only a small bit of information, the client's IP address.
BOOTP (Bootstrap Protocol) is a distinct improvement over RARP in that it provides gateway support (booting over a router) and provides far more information to the booting client. In addition to the client's IP address, BOOTP provides the address of the gateway (router), the address of the server, the subnet mask and the boot file (the bootable image for the client). Note that there can be one, and only one, IP address assigned to a particular hardware address.
The biggest drawback to BOOTP is that it assigns IP addresses to MAC addresses in a one-to-one relationship—a specific MAC address always will be assigned the same IP address. If you think about the requirements presented by a mobile office and traveling laptops, this one-to-one relationship proves to be somewhat limiting. In the mobile office scenario, users travel with their laptops and need to log in to a central server only occasionally, to pick up mail or whatever. The rest of the time, their IP address remains unassigned, which is a terrible waste of an IP address. The problem of underused IP addresses is addressed nicely by DHCP.
DHCP (Dynamic Host Configuration Protocol) is a logical successor to BOOTP. In fact, BOOTP is considered somewhat obsolete and has been largely replaced by DHCP. One reason DHCP has surpassed BOOTP in popularity is that DHCP supports dynamic address range assignment, while BOOTP only supports static IP assignment (a single MAC is always assigned the same IP address). The dynamic IP assignment facility of DHCP allows IP addresses to be reused among many nodes. In the mobile office scenario, a node connects to its network and broadcasts its MAC. The server, running the dhcpd dæmon, has allocated a range of IP addresses for mobile nodes and simply assigns the next IP address in the range to the broadcasting node. DHCP also manages the longevity of the IP-address assignment via a DHCP leases file.
The options to DHCP are myriad and beyond the scope of this article. For further investigation, consult The DHCP Handbook by Ralph Droms and Ted Lemon (Pearson Higher Education, 1999).
After getting its IP information and configuring the adaptor for TCP/IP, the node BIOS typically requests an image over the network. This clear division of IP assignment and image serving is deliberate; it allows for IP assignment and image serving to be potentially served by different machines. TFTP (Trivial File Transfer Protocol) is just the right tool to transfer the image from server to client, since TFTP, unlike its heavier-weight cousin FTP (File Transfer Protocol), does not require a user to log in to get a file. The primitive security built into TFTP is that, by default, TFTP only permits transfer of files from the server's /tftpboot directory. Since this security scheme is fairly well known among system administrators, only public files are put in /tftpboot. In the latest version of tftp-hpa, file-access security was added as well.
Notice that we've been talking about transferring an image—this is because the image can be either a tagged kernel (Etherboot) or a network loader (PXE). If you use Etherboot, the diskette boot method, then BOOTP or DHCP should point to a tagged kernel. If you use true PXE, then BOOTP or DHCP should point to a network loader. In the PXE case, the network loader is loaded into memory and then brings over an untagged kernel via TFTP. To use PXE, the TFTP server must support the “tsize” TFTP option (RFC 1784, RFC 2349). tftp-hpa, by H. Peter Anvin, supports this option and can be obtained at www.kernel.org/pub/software/network/tftp.
Whether you use the two-step PXE boot process (network loader and kernel) or the one-step Etherboot process (just a kernel), eventually the kernel is read into memory on the client and control is transferred to it. You might be wondering, what are the differences between a network boot kernel and a typical kernel built to boot from the local hard drive? The first decision you have to make is whether the kernel is modular or monolithic—meaning no loadable modules. If you've ever built a Linux kernel, you're aware that features are either selected with “Y” (include the feature), “N” (do not include the feature) or “M” (load on demand). If your kernel is monolithic, you cannot have any M features.
Several flags have to be turned on for a monolithic network boot kernel. First, if you're creating a monolithic kernel, you need to turn off modules support. In the .config file, you will see
#CONFIG_MODULES is not set
and if you're doing a monolithic kernel, you would turn it off, as in
CONFIG_MODULES=nYou must support ext2 filesystems if you intend to create them on the client or mount them from the server, as in the case of the NFS root filesystem:
CONFIG_EXT2_FS=yIf you intend to use a remote root filesystem, mounted via NFS, you will need the NFS options:
CONFIG_NFS_FS=y CONFIG_ROOT_FS=ySince you're going to use Ethernet to boot your remote client, you must turn on network configuration and, at least, support for the specific network interface card (NIC) on the client:
CONFIG_NETDEVICES=y CONFIG_EEXPRESS_PRO100=yLastly, you must configure the ability to get your IP address via RARP, BOOTP or DHCP:
CONFIG_IP_PNP_RARP=y CONFIG_IP_PNP_BOOTP=y CONFIG_IP_PNP_DHCP=yIf you are supplying your root filesystem via a RAM disk, instead of over NFS (see next section), you'll have to specify the RAM filesystem options:
In the modular kernel, you are allowed to use M to select modules to be loaded on demand by the kernel. Since the modules are not statically bound in the kernel, there must be a place to load the modules from when they are needed. The place provided by Linux to load the kernel modules is the RAM disk. This RAM disk is used only during the boot process and only to provide the kernel modules. Another use of the RAM disk, to provide the remote root filesystem, is outside the scope of this article.
To demonstrate the use of the modular kernel, we can make support for the MINIX filesystem modular, to support a MINIX root filesystem that will be loaded from diskette. To create a modular kernel, the first thing to do is turn on modular support and rebuild the kernel:
In our example, we're going to provide MINIX support via a dynamic module, so we'll make that option an M:
CONFIG_MINIX_FS=MAfter rebuilding the modular kernel and copying it (bzImage) to /tftpboot, we're going to need a way to load it. Tools like the open-source SYSLINUX package provide the means to load the modular kernel and provide the RAM disk. A sample default pxelinux.cfg file might look like:
DEFAULT bzImage APPEND vga=extended initrd=minix.gz root=/dev/fd0 ro PROMPT 1 TIMEOUT 50The configuration file says that the name of the kernel is bzImage, the RAM disk name will be minix.gz, and the client root will be loaded from diskette (/dev/fd0).
We will have to create a RAM disk that has the MINIX module (minix.o) on it, as well as insmod (the command to load the MINIX module), the console device (/dev/console) and linuxrc, the command that gets called when the RAM disk is invoked. (Refer to the initrd.txt file, written by Werner Almesberger and Hans Lerman and distributed as part of the kernel RPM, for a complete description of how RAM disks work in Linux.) To create the custom RAM disk that will mount the MINIX modular dynamically, and MINIX root filesystem from diskette, you must create a file and zero it out using dd:
dd if=/dev/zero of=minixroot bs=1k count=4096
Next, associate the file with a loopback device, /dev/loop0:
losetup /dev/loop0 minixrootCreate an ext2 filesystem:
mkfs.ext2 /dev/loop0Mount the device over a convenient mountpoint, say /mnt:
mount /dev/loop0 /mntCreate some favorite directories:
mkdir /mnt/dev /mnt/lib /mnt/sbinCreate the console device:
mkdev /mnt/dev/console c 5 1Copy the minix.o module into /lib, and sash and insmod into /sbin:
cp /usr/src/linux/fs/minix/minix.o /mnt/lib/minix.o cp /sbin/sash /mnt/sbin/sash cp /sbin/insmod /mnt/sbin/insmodFind out which libraries insmod needs:
ldd /sbin/insmodand copy them to /mnt/lib:
cp /lib/libc.so.6 /mnt/lib cp /lib/ld-linux.so.2 /mnt/libThen create a /mnt/linuxrc file that loads the minix module:
#!/sbin/sash /sbin/insmod /lib/minix.oMake linuxrc executable:
chmod 777 linuxrcUmount /mnt and detach the loopback device:
umount /mnt losetup -d /dev/loop0gzip the minixroot file, and you're ready to boot the client:
gzip minixrootThis example will get you as far as the kernel trying to load its root filesystem from /dev/fd0. To fully test out the example, you'll have to create a MINIX filesystem on the diskette, mount it and copy over at least init and sh, and the libraries they need, and create a console device.
As we all know, life without a root filesystem (/) is meaningless. When booting locally, the root FS is almost always found on the local hard drive. Where does root come from on a network-booted client? To provide root you have two choices: remotely mounted root over NFS from the server or by a RAM disk. If you provide root via NFS, by default your kernel looks for root in /tftpboot/ip, where ip is the IP address of your client. This requires starting NFS on the server and exporting /tftpboot (or /tftpboot/ip for each node). To get the client node to boot to the login prompt, there are several requirements on the root filesystem, including the init and shell binaries; devices, at least the console device; and any dynamically loaded libraries the init and shell binaries might depend on.
A quick-and-dirty method of populating a remote root filesystem would be to copy init, sh, the necessary libraries and a console device, as in:
cp /sbin/init /tftpboot/192.168.64.1/sbin/init cp /bin/sh /tftpboot/192.168.64.1/bin/sh
To determine the dynamically loaded libraries for init, use the ldd command:
ldd /sbin/init ldd /bin/shand then copy the libraries listed by the ldd commands to /tftpboot/ip/lib. To make the devices, there is a handy MAKEDEV command, part of the MAKEDEV package:
/dev/MAKEDEV -d /tftpboot/220.127.116.11/dev consoleIf you have your other services up and running on the server correctly, when you force a network boot on the client, it will then run the init script from its remote root, using the console provided in its remote root, bring up a shell and prompt for a runlevel (since there is no /etc/inittab file in remote root). Enter s for single-user mode, and just like that, your client is up and running to the shell prompt.
A special shell is available, called sash (for standalone shell) that is extremely useful in the remote environment. This is because sash has no dynamically loaded libraries and provides some standard built-in commands that manipulate filesystems (mount, umount, sync), change file permissions (chmod, chgrp, chown) and archive (ar, tar), among other things. Instead of starting sh, for example, you can copy /sbin/sash to /tftpboot/ip/sbin/sash, and the kernel will bring up the standalone shell instead. You also might want to provide your own rudimentary inittab file to run sash on startup, as in:
In this article we've explored a few of the services and methods used to boot Linux remotely. Remote Linux is extremely fertile ground for continuing research. As networks become faster and can support greater numbers of remote clients, and as clusters become larger and have greater dependency on centralized administration, remote Linux techniques will play an even greater role in the industry. With the advent of dense server technology, remote Linux has become not just a convenience but a necessity.
I gratefully acknowledge the research of Vasilios Hoffman from Wesleyan. “V”, as he likes to be called, demonstrated the use of loopback devices in creating RAM disks and how to create modular network bootable kernels correctly. V is simply a wealth of Linux information.
Richard Ferri is a senior programmer in IBM's Linux Technology Center, where he works on open-source Linux clustering projects such as LUI (oss.software.ibm.com/lui) and OSCAR (www.openclustergroup.org). He has a BA in English from Georgetown University and now lives in upstate New York with his wife, Pat, three teen-aged sons and three dogs of suspect lineage.