Xen Virtualization and Linux Clustering, Part 2
We ended last time after configuring our first unprivileged Xen domain. In this article, we complete our cluster and then test it using an open-source parallel ray tracer. The first thing we need to do is create additional slave nodes to be used with the cluster. So, let's get down to business.
Once you have created and configured an unprivileged domain using the instructions from the previous section, you easily can duplicate this domain as described below. Start by making a backup image of the Debian_Slave1_Root filesystem. You only need to do this step once. Note that the tar command probably will take a few minutes to complete, as it is archiving an entire filesystem.
# mkdir /data/xen-images # mount /dev/VG/Debian_Slave1_Root /mnt/xen # cd /mnt/xen # tar jpcf /data/xen-images/debian-slave-root.tar.bz2 * # ls -sh /data/xen-images/debian-slave-root.tar.bz2 # cd / # umount /mnt/xen
You now have a working root filesystem archive that can be used to create additional domains with minimal effort. Let's use this archive to create our second unprivileged domain, which we call debian_slave2. Start by creating the logical volumes for the additional domain:
# lvcreate -L1024M -n Debian_Slave2_Root VG # lvcreate -L64M -n Debian_Slave2_Swap VG # mke2fs -j /dev/VG/Debian_Slave2_Root # mkswap /dev/VG/Debian_Slave2_Swap # mount /dev/VG/Debian_Slave2_Root /mnt/xen # cd /mnt/xen # tar jpxf /data/xen-images/debian-slave-root.tar.bz2
We now must modify two configuration files in the new domain root. This includes changing the IP address of eth0 in /mnt/xen/etc/network/interfaces to match the IP address we chose for the new domain--remember the Domain-0 /etc/hosts file we created earlier?. We also need to change the hostname in /mnt/xen/etc/hostname to debian_slave2.
Finally, create a Xen config file for the new domain and save it as debian_slave2.conf. Don't forget to give this domain a unique MAC address:
name="debian_slave2" memory=64 kernel="/boot/vmlinuz-2.6.11-xenU" nics=1 vif=[ 'mac=aa:00:00:00:00:02, bridge=xen-br0' ] disk=[ 'phy:VG/Debian_Slave2_Root,sda1,w', 'phy:VG/Debian_Slave2_Swap,sda2,w' ] root="/dev/sda1 ro"
You now are ready to boot the additional domain.
# umount /mnt/xen # xm create /etc/xen/debian_slave2.conf -c
The archive technique presented above can be repeated to create as many Debian Sarge unprivileged domains as you like. In addition, this technique also can be used in a more general sense to create complete backups of an OS. This might be useful, for example, before performing software installations/upgrades or other tasks that potentially may harm your system. If things don't work out, simply restore the domain to a previously working state. In fact, I completely erased all of the files on one of my unprivileged domains and was able to restore the entire filesystem within five minutes.
At this point, you should have PVM installed on Domain-0 and be able to boot at least one slave domain that is completely configured as described in the previous sections. Although one slave is sufficient, the next few sections definitely are more interesting if you have two or more slaves configured.
The first step to configuring the PVM cluster is to create a pvm.hosts configuration file, which lists the hostnames of the cluster nodes that you want to use. Note that the specified hostnames should match the hostnames as listed in the /etc/hosts file on Domain-0. In turn, those hostnames should match the hostname in the /etc/hostname file on each domain. An example pvm.hosts file is shown below:
# Master PVM Host master # Slaves debian_slave1 debian_slave2 debian_slave3
Lines beginning with # are comments. You can read the man page for pvmd3 for more details on the PVM configuration file. Your pvm.hosts config file now can be used to start PVM daemon processes (pvmd) on the master and all slave nodes. The PVM daemon provides the message passing interface that we discussed earlier. To start the PVM daemons on all nodes listed in the pvm.hosts file, use the command:
# $PVM_ROOT/lib/pvm pvm.hosts
Before running this command, be sure that all of your slaves are booted using the xm create command. You can get a list of currently booted domains by running xm list.
Now is a good time to test your PVM configuration to make sure it works correctly on both the master and slaves. Start by setting up the appropriate links on the master to allow the PVM executables to run without specifying their paths:
# ln -s $PVM_ROOT/lib/pvm /usr/bin/pvm # ln -s $PVM_ROOT/lib/aimk /usr/bin/aimk
Next, compile an example PVM program:
# cd $PVM_ROOT/examples # aimk hello hello_other
If they are not booted already, boot each Xen slave using commands similar to the following:
# xm create /etc/xen/debian_slave1.conf # xm create /etc/xen/debian_slave2.conf # xm create /etc/xen/debian_slave3.conf
Once your slaves are booted, start the PVM daemons on the master and slaves by running the command:
# pvm pvm.hosts
This command starts the PVM daemons on all cluster nodes specified in the pvm.hosts file and then leaves you at a PVM console. You can use the conf command to see a list of all hosts that are successfully running a PVM daemon. The quit command exits the PVM console but leaves all of the PVM daemons running, which is what we want. An example of this is shown below:
pvm> conf conf 4 hosts, 1 data format HOST DTID ARCH SPEED DSIG master 40000 LINUX 1000 0x00408841 debian_slave3 c0000 LINUX 1000 0x00408841 debian_slave1 100000 LINUX 1000 0x00408841 debian_slave2 140000 LINUX 1000 0x00408841 pvm>quit
Now that the PVM daemons are running, copy the hello_other executable that we compiled above to the slaves. This same approach also can be used to copy other executables that the slaves will need to execute.
# cd $PVM_ROOT/bin/LINUX # scp hello_other [email protected]_slave1:$PVM_ROOT/bin/LINUX/hello_other # scp hello_other [email protected]_slave2:$PVM_ROOT/bin/LINUX/hello_other # scp hello_other [email protected]_slave3:$PVM_ROOT/bin/LINUX/hello_other
Now run the hello program on the master:
This should produce output similar to the following:
i'm t40009 from tc0003: hello, world from debian_slave3
Congratulations! You now have a working cluster set up on your computer.
Once you're done running PVM programs, you can stop the PVM daemons on the master and slaves by using the halt command from the PVM console:
# pvm pvmd already running. pvm> halt halt Terminated
Now that you have multiple domains created and configured for use as a cluster, we can install and test a useful PVM program. I chose to test the cluster by using an open-source ray tracer. Ray tracing involves tracing rays into a scene to perform lighting calculations in order to produce realistic computer-generated images. Because rays must be traced for each pixel on the screen, ray tracing can be parallelized naturally by calculating the colors of multiple pixels simultaneously on different members of the cluster, thereby reducing the render time (if we were actually using multiple computers).
In this section, I describe the installation and use of a PVM patch for the POV-Ray ray tracer called PVMPOV. PVMPOV divides the rendering process into one master and many slave tasks, distributing the rendering across multiple systems. The master divides the image into small blocks that are assigned to slaves. The slaves return completed blocks to the master, which the master ultimately combines to generate the final image.
Begin by installing PVMPOV 3.1 on Domain-0. Installation instructions can be found in the PVMPOV HOWTO in Chapter 1, "Setting up PVMPOV". If the first wget command in Section 1.1 gives you trouble, try
instead. Also, in Section 1.4, it should not be necessary to run the command aimk newsvga.
After completing these instructions on the master, create a directory for storing .pov files (POV-Ray input files) as well as the generated images. On my system, I created a folder named /etc/xen/benchmark. The .pov files may need access to other POV-Ray include files, so create a link to the appropriate directory, which is located with the PVMPOV source that you compiled above. As an example, I used the following command on my system:
# ln -s /install/povray/pvmpov3_1g_2/povray31/include # /etc/xen/benchmark/include
Once you have completed the PVMPOV installation on the master, you must copy the required binaries, libraries and other files to the slaves. The following example shows how to do this for debian_slave1 from the Domain-0 console:
# cd $PVM_ROOT/bin/LINUX # scp pvmpov [email protected]_slave1:$PVM_ROOT/bin/LINUX/pvmpov # scp x-pvmpov [email protected]_slave1:$PVM_ROOT/bin/LINUX/x-pvmpov # scp /usr/lib/libpng* [email protected]_slave1:/usr/lib/ # scp /usr/lib/libz* [email protected]_slave1:/usr/lib/ # scp /usr/X11R6/lib/libX11.* [email protected]_slave1:/usr/lib/ # ssh debian_slave1 (remote)# cd /etc (remote)# mkdir xen (remote)# cd xen (remote)# mkdir benchmark (remote)# exit # cd /etc/xen/benchmark # scp -r * [email protected]_slave1:/etc/xen/benchmark/
Before we can generate our first ray-traced image, we need a scene to render. The PovBench Web site provides a POV-Ray scene called skyvase.pov that can be used for benchmarking purposes. Download this scene using the following commands:
# cd /etc/xen/benchmark # wget http://www.haveland.com/povbench/skyvase.pov
Next, copy the downloaded skyvase.pov file to each slave. For example, for debian_slave1:
# scp /etc/xen/benchmark/skyvase.pov # [email protected]_slave1:/etc/xen/benchmark/skyvase.pov
Once you've copied the scene to all of the slaves, you are ready to generate an image. Be sure to boot the required Xen slaves and start PVM daemons on each slave. For example, before running PVMPOV with three slaves:
# xm create /etc/xen/debian_slave1.conf # xm create /etc/xen/debian_slave2.conf # xm create /etc/xen/debian_slave3.conf # pvm pvm.hosts pvm> conf conf 4 hosts, 1 data format HOST DTID ARCH SPEED DSIG master 40000 LINUX 1000 0x00408841 debian_slave3 c0000 LINUX 1000 0x00408841 debian_slave1 100000 LINUX 1000 0x00408841 debian_slave2 140000 LINUX 1000 0x00408841 pvm>quit
PVMPOV is run using the pvmpov binary. You also must supply an input file specifying the scene to be rendered, in our case, skyvase.pov. The list of supported PVMPOV command-line arguments is discussed in the PVMPOV HOWTO. As an example, the following command shows how to render the syvase.pov scene at 1024x768 resolution on three slaves, using 64x64 pixel blocks and storing the generated image in skyvase.tga:
# cd /etc/xen/benchmark # pvmpov +Iskyvase.pov +Oskyvase.tga +Linclude # pvm_hosts=debian_slave1,debian_slave2,debian_slave3 +NT3 +NW64 +NH64 # +v -w1024 -h768
The command-line arguments specify the following settings:
+Iskyvase.pov - Use skyvase.pov as input
+Oskyvase.tga - Store output as skyvase.tga
+Linclude - Search for POV-Ray include files (for shapes and the like) in the ./include directory
pvm_hosts=debian_slave1,debian_slave2,debian_slave3 - Specify which PVM hosts to use as slaves
+NT3 - Divide the rendering into three PVM tasks (one for each slave)
+NW64 - Change the width of blocks to 64 pixels
+NH64 - Change the height of blocks to 64 pixels
+v - Provide verbose reporting of statistics while rendering
-w1024 - The rendered image should have a width of 1024 pixels
-h768 - The rendered image should have a height of 768 pixels
On my system, this scene takes about 40-45 seconds to render. Once the program completes, you should find a file named /etc/xen/benchmark/skyvase.tga that contains the generated image. If everything worked correctly, congratulations! You just successfully used a Linux cluster to run a parallel ray tracer, all on a single physical computer running multiple concurrent operating systems. Go ahead. Pat yourself on the back.
And if things aren't working yet, don't give up. With a little troubleshooting, you're sure to figure it out--and believe me, I've done my fair share of troubleshooting.
Let's step back for a minute and think about everything we've accomplished here. We started by installing Xen and configuring Domain-0 as well as several unprivileged domains. During this process, we got practical experience using LVM to set up unprivileged domain filesystems, and we saw how we can create archive backups of an entire OS filesystem. We also learned how to set up a small cluster using PVM. We even tested our cluster using real-world parallel software.
By now, you should feel like an expert in using Xen virtualization and Linux clustering, especially if you had to do any troubleshooting on your own. If you made it this far, you now can mention the word "virtualization" and explain that your computer not only has multiple operating systems installed but it can run them at the same time! And if that doesn't impress some people, mention that your computer also doubles as a Linux cluster.
Ryan Mauer is a Computer Science graduate student at Eastern Washington University. In addition to Xen virtualization, he also dabbles in 3-D computer graphics programming as he attempts to finish his Master's thesis.