Virtualize a Server with Minimal Downtime
You don't have to look far in tech news to see that virtualization has become a big deal. After all, computers continue to become faster and more powerful, and as they do, the services that run on them often use fewer overall resources. On top of that, modern servers often need a fraction of their predecessors' power and cooling. With virtualization, you can get power savings and more efficient use of server resources, and you can create servers quickly without waiting on parts to arrive.
Although some people start from a clean slate and create brand-new virtual servers from scratch to replace old physical machines, you simply may not have the extra time to invest to make the clean break to new versions of Linux with all-new packages. In many cases, it makes more sense to create a virtual clone of a physical server and keep all the files and services identical. If you want to go this route, there are a number of different methods you can use if you can handle unlimited downtime, but if you want to keep downtime and overall disruption to a minimum (and who doesn't?), there is a fairly simple process I've used to migrate a large number of servers to virtual machines with around 30 minutes of average downtime—well within most acceptable maintenance windows.
Before I get into the actual steps, there are some limitations to this approach that I should mention. First, this method has been designed and tested to work with VMware virtualization, specifically with its enterprise server products (although it would also work fine with their free server product). VMware works well for this process because it doesn't require that I modify my current Linux kernel to virtualize it—something that isn't always possible when you want to virtualize an old server. Having said that, these steps also could work with other virtual machine technologies that can use an unmodified Linux kernel. Second, this procedure has been tested with and is aimed at Red Hat-like distributions (Red Hat, CentOS and so on), but with a few tweaks I discuss later, it also could work with other distribution flavors. Finally, the actual amount of downtime you will need for this process probably will vary from my results, especially as you first test out each of the steps. Servers with large or slow disks and, specifically, servers that change large amounts of data frequently possibly will take longer to sync.
Along with these disclaimers, it's only fair to point out some of the benefits to this method:
You can do a majority of the migration work against a live server.
Standard Linux tools are used for the synchronization and other changes.
The process protects your network from both servers showing up at the same time.
You safely can leave the old server on the network and access its files while users use the new virtual machine.
Now that the disclaimers are out of the way, let me summarize the process in a few general steps. First, create a fresh virtual machine to replace the physical host. Then, boot in to the virtual machine with a rescue CD and partition the disk. Next, perform the main synchronization of files from the live physical server to the new virtual machine. Take down your physical machine and reboot in to a rescue CD, and perform a final synchronization from the off-line server. After that, change the boot settings on the virtual machine to suit its new environment, and then reboot in to your new virtual machine.
To get started, first you must create a virtual machine container to replace your physical machine. Specific steps are different if you use VMware Server versus Virtual Infrastructure 3, but ultimately, what you want to do is to create a machine that mostly matches your physical machine's specifications. The specifications don't have to match exactly, and there actually are good reasons why you might want to tweak the settings a bit. For instance, if your server has 2GB of RAM, but you notice that it really needs only 1GB, now would be a good opportunity to change it. If your server is starting to run out of storage, this is a good time to increase it. If your physical server has a 32-bit or 64-bit processor, however, make sure the virtual machine matches. Also, be sure that you match the operating system version you report to VMware with your actual OS if possible. For instance, if your server runs RHEL 3, don't tell VMware that it runs RHEL 4. You want to ensure that the OS will have drivers for the virtual devices that VMware presents, specifically for the disk subsystem. For instance, I've had numerous headaches due to RHEL 4's removal of the BusLogic SCSI module from the base OS (a virtual SCSI device that is a commonly used by VMware along with an LSI Logic virtual SCSI device).
After you set the specs for the virtual machine, edit the CD-ROM device so that it points either to an actual rescue CD in the VMware server or to an ISO. I prefer Knoppix for this procedure, but any live CD should work as long as it has the rsync and chroot tools, an SSH server and enough module support to access the disks on both the physical machine and the virtual machine. Now, boot the virtual machine into the rescue CD. Everything you need to do is done via the command line, so under Knoppix, type knoppix 2 at the boot: prompt to bypass the GUI and go straight to a command line.
After Knoppix boots, you need to partition, format and mount the new partitions for this virtual machine. Use fdisk or cfdisk from the command line to create your partitions to match your physical server. Again, you don't have to match the partition sizes exactly, as long as there is plenty of room to store all the files from the physical server. For this example, I will have a physical server with a single SCSI drive (/dev/sda) with three partitions: /dev/sda1 for root, /dev/sda2 for swap and /dev/sda3 for /home. After you create the same partitions on the virtual machine, format them with the same filesystems you use on the physical machine, create mountpoints for them and then mount them:
$ sudo mkfs -t ext3 /dev/sda1 $ sudo mkfs -t ext3 /dev/sda3 $ sudo mkswap /dev/sda2 $ sudo mkdir -p /mnt/sda1 /mnt/sda3 $ sudo mount /dev/sda1 /mnt/sda1 $ sudo mount /dev/sda3 /mnt/sda3
Now that you have created and mounted the partitions, you are ready for the first synchronization. For this to work, your virtual machine must have network access, and specifically, it needs to be able to access SSH on the physical machine. By default, Knoppix will attempt to get a DHCP lease if available, but otherwise, if your rescue disc is not able to get on the network, you need to make the necessary changes so that it can. This virtualization procedure reduces downtime by synchronizing the files twice—once while the physical server is running and once after it is off-line. The idea here is that a majority of files on most servers stay the same, at least over one or two days. If you perform the bulk of the file synchronization while the server is on-line, when you take it off-line, the final synchronization can occur much faster.
I use rsync for the synchronization, and for it to work, you need to allow (at least temporarily) for root SSH logins to occur on the physical machine. If it is disabled, edit /etc/ssh/sshd_config and change PermitRootLogin no to PermitRootLogin yes, and restart sshd. Otherwise, it will be difficult for rsync to copy all the files on the system. You will run an rsync command for each partition on the physical server, so in this example, that makes two rsync commands:
$ sudo rsync -avx --numeric-ids ↪--progress physicalhost:/ /mnt/sda1/ $ sudo rsync -avx --numeric-ids ↪--progress physicalhost:/home/ /mnt/sda3/
The rsync options I use here are chosen very deliberately, so it's worth understanding what each of them does. The -a option sets “archive mode”, which essentially turns on a number of rsync options that preserve file ownership and permissions and other settings. The -v option makes rsync provide more output about what it is doing, and the --progress argument displays a progress meter so you can keep up with how long rsync will take. The other two arguments are rather important, and if you don't use rsync regularly, you might not come across them much. The -x argument tells rsync to stick to one filesystem. This is important particularly when you back up the / partition; otherwise, rsync happily will traverse into /home or any other partitions you have and copy them all into your local /mnt/sda1 mountpoint, which probably will not have enough space to hold everything. The --numeric-ids argument sets file permissions on the destination files based on their numeric ID and not the matching user or group name. This is important as the Knoppix CD very likely has different user and group ID mappings than your server.
After these rsync commands complete, you are ready to take your physical server off-line. If you did need to schedule a maintenance window for the physical server, just leave the virtual machine running in its current state, and proceed to the next step when you are ready to take the physical machine off-line. If a number of days will pass until your maintenance window, you might want to run the above rsync commands again once you are close to the maintenance window, just so the final off-line rsync will happen more quickly.
On the Physical Server:
The last synchronization happens when the physical server is completely off-line, so you can make sure that no other files change on you. To do this, simply take a Knoppix CD (or your preferred rescue CD) to the physical machine and boot from it. All the commands you run will be from the command line, so you can boot in to Knoppix's terminal-only mode here as well. As Knoppix boots, it should detect your partitions automatically and create mountpoints under /mnt for them, but if it doesn't, just use the mkdir command to create them manually. Knoppix will not mount partitions automatically at boot, so you need to do that manually. In the case of this example, my physical server has two partitions to mount:
$ sudo mount /dev/sda1 /mnt/sda1 $ sudo mount /dev/sda3 /mnt/sda3
Now I need to set a password for the root Knoppix user and then start the SSH server on this machine so I can run the rsync:
$ sudo passwd $ sudo /etc/init.d/ssh start
Keep in mind that because I booted this machine into Knoppix, it most likely has gotten a different IP address via DHCP. Type /sbin/ifconfig to check which IP address the machine currently has, as you will need it for the final rsync.
On the Virtual Server:
You now can start the final synchronization from the virtual server. The commands are very similar to what you used before, except this time, I add the --delete option so that rsync will remove any files on the virtual machine that were deleted from the physical machine since the last time I synced. Also notice that because the physical server is now booted in to Knoppix, I have to change the directory paths and the IP address for the remote host, as they changed since I booted in to Knoppix:
$ sudo rsync -avx --numeric-ids --progress ↪--delete 192.168.1.150:/mnt/sda1/ /mnt/sda1/ $ sudo rsync -avx --numeric-ids --progress ↪--delete 192.168.1.150:/mnt/sda3/ /mnt/sda3/
These commands could take a long time or a short time, depending on how many files have changed since the last time you ran rsync. Once it completes, you are ready to perform the final finishing touches on your virtual machine before bringing it into service.
Even though the files on the virtual machine are identical to the physical machine, the virtual machine will not boot correctly at this point until you make some changes to the boot settings. This works best from within a chroot environment, so type:
$ sudo chroot /mnt/sda1
before you run the rest of the commands. Be sure to replace /mnt/sda1 with the mountpoint for your root partition if it is different.
The first change you need to make within the chroot environment is to restore your bootloader. If you use GRUB, look at /boot/grub/menu.lst or /boot/grub/grub.conf. If you use LILO, look at /etc/lilo.conf. Check for any devices that may have changed. In particular, if you switched from an IDE to a SCSI device, from a RAID to a non-RAID or changed the root partition order, be sure to make changes here to reflect that. Next, if you use GRUB, type:
# grub-install /dev/sda
Change /dev/sda to match the primary disk device from which you will boot. If you use LILO, type:
After your bootloader has been installed, check /etc/fstab and confirm that any drive, partition or device changes you made in your bootloader config file also were changed here.
Many servers these days use an initrd file to load modules that are essential for the boot process but that don't necessarily fit in the kernel image. Often, this initrd file contains only the modules that suit your hardware, so when you make the switch to new hardware, such as is the case with VMware's virtual SCSI controllers, you need to create a fresh initrd that has these new modules in it.
On a Red Hat system, edit either /etc/modules.conf or /etc/modprobe.conf for RHEL 4, and remove any references to scsi_hostadapter you find there. If you configured your virtual machine to use VMware's virtual BusLogic SCSI controller, replace those references with the following:
alias scsi_hostadapter BusLogic
If you chose VMware's LSI Logic SCSI controller, add the following lines instead:
alias scsi_hostadapter mptbase alias scsi_hostadapter1 mptscsih
Obviously, these modules are specific to VMware virtualization, so if you want to attempt this with another virtualization technology, you will need to look up which SCSI modules it uses and make sure they are referenced here.
Now, you are ready to create a new initrd. Find the location of the initrd your server last used from your /boot/grub/menu.lst, /boot/grub/grub.conf or /etc/lilo.conf file, and then move it out of the way so you can create a new one safely. Then, run mkinitrd with the path to the initrd file to create and the name of the current kernel. For my example server, I am using the Red Hat 2.4.21-32.0.1.ELsmp kernel, so I would type:
# mv /boot/initrd-2.4.21-32.0.1.ELsmp.img ↪/boot/initrd-2.4.21-32.0.1.ELsmp.img.bak # mkinitrd /boot/initrd-2.4.21-32-0.1.ELsmp ↪2.4.21-32-0.1.ELsmp
As I said before, this is the method Red Hat uses to create initrd files. Unfortunately, different distributions use different methods. For instance, Debian's mkinitrd stores configuration files under /etc/mkinitrd, and the mkinitrd command uses slightly different options, so you might need to do some extra research to create a new initrd for your server's distribution.
At this point, you can reboot the virtual machine. Confirm that your physical machine no longer has its original IP address, or otherwise, simply power it off to be safe. If your server runs a hardware configuration service like kudzu, it most likely will prompt you at boot time because it has detected changes in the server's hardware. Be sure to select Keep Configuration for any old SCSI or network hardware it mentions, and select Ignore for any new SCSI or network hardware; however, you safely can remove old video, sound, USB and similar hardware if you are prompted.
Once the machine has booted completely, confirm that all system services have started and that you are connected to the network. I have noticed on some Red Hat systems that the network card's MAC address has been hard-coded into the configuration file, and as that has changed on the new virtual hardware, the network won't resume. In this case, simply edit the configuration file for your network card under /etc/sysconfig/network-scripts/ (often ifcfg-eth0), and either remove the reference to the MAC address or change it to reflect the new MAC address. Then, restart the networking service.
Practice this procedure on a few test machines to be sure you have all the steps down for your particular network before attempting it on a live production machine. Nothing is worse than scrambling to fix strange initrd issues on a virtual machine while the physical server is down and your maintenance window is ticking away. You will find that the more often you perform these migrations, the faster you can do them—you even might be able to stagger them and complete a few at the same time.
Kyle Rankin is a Senior Systems Administrator in the San Francisco Bay Area and the author of a number of books, including Knoppix Hacks and Ubuntu Hacks for O'Reilly Media. He is currently the president of the North Bay Linux Users' Group.