Colocating Servers and Managing them Remotely
The deal has been inked, your client is going to colocate servers at a datacenter featuring a fat 100Mbps pipe, and you're the guy who is going to make it happen. Luckily, there are only ten thousand questions you need answered. One of them probably is how you'd control all of these machines remotely if they became inaccessible via the network for some reason. Come on; if you're anything like me, you too have found yourself locked out of your system because of a boneheaded misplaced firewall rule.
See, I don't plan on visiting that datacenter very often, and the idea of instructing their staff to fix my machines (at $125/half hour) is unappealing to say the least. The more we can control remotely, the better off we'll be.
Without getting fancy, we evaluated two options:
KVM-over-IP: KVM switches are those things you're destined to see in heavy-duty Windows environments. Typically, they allow you to connect many computers to only one set of keyboard, mouse and monitor, and they provide some kind of switch to flip between computers. KVM-over-IP would be one of these KVM switches with a network or dialup interface.
Terminal Server: An appliance that we can connect to our servers via a serial link to access each system's serial console. Once connected to the terminal server (via network, dial-up modem or other), you simply choose which server's serial console you'd like to access.
We decided to go with a terminal server, the Cyclades-TS1000 16-port model to be precise. Without going into an arduous KVM-over-IP vs. Terminal Server debate, here are the key factors that helped decide it:
The Cyclades-TS1000 runs an embedded Linux system. You can log into it and run close to anything you want with enough poking. From reading the manual alone, it is clear that the TS1000 has a variety of uses and that our setup is perhaps one of the most trivial ways it can be deployed.
You can either Telnet or ssh into the TS1000, but practically speaking you can probably get it to speak or authenticate against anything. Major brownie points here.
Cyclades Corporation clearly gets it. In one issue of Don Marti's Aspire To Crudeness, he notes: ``Cyclades, for you Linux history buffs, was the first company to support development of Linux drivers for its hardware (a multiport serial card). They gave a card to the driver author.''
I don't know about you, but whenever I have to deal with KVMs, I find that they're quite sub-par. For reasons that no one can explain to me, a display can lock up on you, and there is no way to alleviate the problem besides rebooting the server with the afflicted display. This isn't limited to one manufacturer; I've seen this on every KVM that I've ever used. Thoughts of frozen KVM displays during an emergency danced through my mind. Perhaps other people are used to rebooting their machines to fix mysterious computing problems, but I'm sure not.
The terminal server was cheaper than the KVM-over-IP, by about $1200 when we last checked.
Bottom line? The Cyclades won, or as Don Marti noted to me, "You don't see people managing Beowulf clusters with KVMs."
We ordered our 16-port Cyclades Terminal Server plus 16 DB9-RJ45 cables that hook into each system's COM port (the Cyclades ports are RJ45). They also threw in some extra cables just in case, including a special Sun Netra cross-over cable. Very thoughtful.
Configuration of the Cyclades TS-1000 is performed using a plain vanilla serial cable and a terminal emulator. I used minicom and configured it to open /dev/ttyS0. Since the Cyclades appliances are so powerful, they require a bit of end-user configuration. Luckily the printed manual that accompanied our Cyclades held my hand through the entire process. If you can configure sendmail, Samba or Apache, this should be no sweat. Once the initial setup is complete, the Cyclades can be accessed via Ethernet, dial-up or anything else you think of.
To be honest, I was actually expecting to have to log into the Cyclades and run some terminal emulator against a serial device (such as /dev/ttyS4 to access link 5). Those clever devils at Cyclades went one better; the TS-1000 creates a virtual network interface for each serial link, complete with its own IP address. With glee, I made up an internal subnet (10.0.1. to be exact) for the serial links.
From here, you can simply Telnet to each link's corresponding IP address from your Cyclades session, and you should have a live serial link to your server. If the idea of logging into the Cyclades to then Telnet to an IP address seems arduous to you, you can route the addresses onto the LAN so you can skip the Cyclades login. However, if there is a risk that strangers can make this connection, the Cyclades should be configured to authenticate clients before giving them access to the serial link.
Standard Net safety rules apply here as well. If the Cyclades is going to be accessed over an insecure network, ssh should be used instead of Telnet. In addition to protecting your communications with anti-mean-people-cryptography, ssh can be configured to allow for passwordless logins and other convenience features. In fact, since ssh provides the same features as Telnet without the insecurity, you may want to simply disable Telnet on the Cyclades altogether. Simply comment out the Telnet line in /etc/inetd.conf and send signal HUP to the inetd process.
Since I'm a big fan of organization (and who isn't?), each server's serial IP address has a corresponding entry under the ts domain for its hostname. For example, dbms3.example.com would have entry dbms3.ts.example.com, which resolves to the IP address of its corresponding network interface on the Cyclades.
Once connected, you have a live serial link to your machine. You did configure your machines to speak over serial, didn't you? Lucky for you, we're discussing that next.
Getting a system to speak over serial is surprisingly simple once you do some reading. For these examples we will assume that the serial connection is on COM1 (better known as /dev/ttyS0) and that we're running Red Hat Linux servers that boot with LILO (and not GRUB). All commands are run as superuser or a user with the equivalent capabilities unless otherwise noted.
Remote control of your boot loader can save you from a jam. Fortunately, it's real easy to get LILO to treat the serial link like a regular display. Adding this line to the global config section in /etc/lilo.conf should do it:
By default, Red Hat configures LILO to display a graphical boot prompt that may interfere with your serial console. This is disabled by commenting out the message= line.
From here, you can boot remotely whichever kernel your heart desires--great for those times when you build, install and boot a new kernel and things go horribly wrong, ("oops, that's not where the root partition is after all").
If your BIOS doesn't display to serial, you may want to configure a longer than usual LILO wait time (the delay= line). You can quickly lose interest in your serial console session if nothing is being displayed to your screen for a few minutes, while BIOS and perhaps your SCSI controller go through their lengthy startup procedures. Configuring a one minute timeout usually gives me enough time to remember that LILO is waiting for me while I'm checking my e-mail.
(Controlling a BIOS remotely via serial, albeit possible is one thing I have no experience with. The servers we ordered did not support the feature. This is not so bad; chances are if you need to access BIOS, you are performing a hardware upgrade/modification and need to be on-site anyway. At least that's my reasoning.)
Besides LILO, you must also tell your kernel about your newfound serial link prowess. Instructing the kernel to print messages by default over a serial console in addition to the video console is also as easy. Add this line:
to your lilo.conf file. Now you can watch your kernel come to life from the comfort of your bunny slippers at home.
A note on startup scripts, (thanks to Drew Dibble for pointing this out): the order of the consoles listed in the append=... command is important. LILO and the kernel will write to both consoles, but the system startup scripts will choose only one of them. The order appears arbitrary, but on my system, listing the physical console first and the serial console second causes init scripts to use the serial console for I/O.
Once the system has started, you'll probably find yourself wanting to log in via serial console to test it out (or, *cough*, misplaced firewall rules are blocking ssh). Luckily, logging in via a serial console isn't that much different than logging in via plain vanilla virtual consoles.
Pop open /etc/inittab, find the section that launches mingetty (or any other getty your flavor of Linux uses) and add a friend to it:
This tells the init process to maintain a mingetty on the serial port. Mingetty is that thing that displays the system hostname, any admin messages and the login: prompt. (Fun fact: the Password: prompt, on the other hand, is displayed by the login utility that mingetty invokes once it has received your username.) I assigned its entry as number 9, but any unique number here will do. The second field tells init which runlevels to maintain this service under. I did not list runlevel 5--the runlevel where init will start X--since there is little reason to run X on a server you only deal with remotely. (Also, X11 may not respond kindly to being started on a serial console.)
Once you tell init to reread its config file (telinit q is but one way) it will maintain a getty on the serial port. You should be able to log in via the Cyclades once this is set up. Now the best part.
SysRQ gives you a direct channel to send instructions to the kernel--you want this. Often times what appears to be a completely hung system is really a living kernel that has lost all means of contacting the outside world; X11 crashes and leaves your display unusable, a driver bug prevents the kernel from accessing the disk, an out-of-memory condition has killed critical system processes, etc. Most people at this point reach for the reset switch and pray that their system comes back up safely. If your systems are colocated at a datacenter, you may have to wait several minutes for a technician to come reboot your system. (It is possible to control the power remotely with special equipment, but many companies find this route to be prohibitively expensive.)
With SysRQ and a serial console, you can eliminate 90% of these painful reset experiences. Very often, the kernel on a hosed system will still be able to respond to SysRQ commands. To help recover from disaster scenarios, you can request a process list from the kernel, CPU register dumps, instruct it to kill most/all processes, perform an immediate reboot and even attempt to sync and remount the filesystems as read-only. There's a table of available SysRQ commands in the kernel source documentation. See Documentation/sysrq.txt.
SysRQ support can be compiled out of the kernel, but most distributions leave the support in and simply disable it from userland by default. Reenabling it is done with echo 1 > /proc/sys/kernel/sysrq. You can make this setting permanent by adjusting /etc/sysctl.conf and changing the sysrq line to equal 1.
A word on security: If you allow unrestricted physical access to your systems, people can do mean things to your machine by hitting the Magic SysRQ keyboard sequence. You may want to disable SysRQ if this is a concern, or simply remove VGA/Keyboard support from your kernel.
Once logged in via serial link, you can get the kernel's attention by sending a BREAK (using Cyclades Telnet, hit CTRL-] and then the B key). After sending BREAK you push a key corresponding to your given command. An invalid command key will simply display the help.
SysRQ over serial alone can turn a potentially painful and costly situation into a minor inconvenience.
We were short on time for this project, and in addition to solving many of our remote server management issues, our Cyclades-TS1000 also proved invaluable during initial configuration. Colocation facilities are made to keep computers snug, but they're not fun for humans to work in for extended periods of time. Our Cyclades allowed us to configure, test and debug our systems remotely from our cushy offices without fear of losing network access, providing some valuable relief during a rather stressful migration period.
Once set up, your systems also can be controlled from afar with god-like majesty thanks to the power of the serial console. The world is clearly a better place because of them. Happy hacking.
Michael Bacarella operates a legal fiction titled Netgraft Corporation, a computer consulting firm located in New York. He shares an apartment with his wonderful fiancée and a fearful green iguana named Kang.