Building a Two-Node Linux Cluster with Heartbeat
The term "cluster" is actually not very well defined and could mean different things to different people. According to Webopedia, cluster refers to a group of disk sectors. Most Windows users are probably familiar with lost clusters--something that can be rectified by running the defrag utility.
However, at a more advanced level in the computer industry, cluster usually refers to a group of computers connected together so that more computer power, e.g., more MIPS (millions instruction per second), can be achieved or higher availability (HA) can be obtained.
Most super computers in the world are built on the concept of parallel processing--high-speed computer power is achieved by pulling the power from each individual computer. Made by IBM, "Deep Blue", the super computer that played chess with the world champion Garry Kasprov, was a computer cluster that consisted of several hundreds of RS6000s. In fact, many big time Hollywood movie animation companies, such as Pixar, Industrial Light and Magic, use computer clusters extensively for rendering (a process to translate all the information such as color, movement, physical properties, etc., into a single frame of picture).
In the past, a super computer was an expensive deluxe item that only few universities or research centers could afford. Started at NASA, Beowulf is a project of building clusters with "off-the-shelf" hardware (e.g., Pentium PCs) running Linux at a very low cost.
In the last several years, many universities world-wide have set up Beowulf clusters for the purpose of scientific research or simply for exploration of the frontier of super computer building.
Clusters in this category use various technologies to gain an extra level of reliability for a service. Companies such as Red Hat, TurboLinux and PolyServe have cluster products that would allow a group of computers to monitor each other; when a master server (e.g., a web server) goes down, a secondary server will take over the services, similar to "disk mirroring" among servers.
Because I do not have access to more than one real (or public) IP address, I set up my two-node cluster in a private network environment with some Linux servers and some Win9x workstations.
If you have access to three or more real/public IP addresses, you can certainly set up the Linux cluster with real IP addresses.
In the above network diagram (fig1.gif), the Linux router is the gateway to the Internet, and it consists of two IP addresses. The real IP, 220.127.116.11, is attached to a network card (eth1) in the Linux router and should be connected to either an ADSL modem or a cable modem for internet access.
The two-node Linux router consists of node1 (192.168.1.2) and node2 (192.168.1.3). Depending on your setup, either node1 or node2 can be your primary server, and the other will be your backup server. In this example, I will choose node1 as my primary and node2 as my backup. Once the cluster is set, with IP aliasing (read IP aliasing from the Linux Mini HOWTO for more detail), the primary server will be running with an extra IP address (192.168.1.4). As long as the primary server is up and running, services (e.g., DHCP, DNS, HTTP, FTP, etc.) on node1 can be accessed by either 192.168.1.2 or 192.168.1.4. In fact, IP aliasing is the key concept for setting up this two-node Linux cluster.
When node1 (the primary server) goes down, node2 will be take over all services from node1 by starting the same IP alias (192.168.1.4) and all subsequent services. In fact, some services can co-exist between node1 and node2 (e.g., FTP, HTTP, Samba, etc.), however, a service such as DCHP can have only one single running copy on the same physical segment. Likewise, we can never have two identical IP addresses running on two different nodes in the same network.
In fact, the underlining principle of a two-node, high-availability cluster is quite simple, and people with some basic shell programming techniques could probably write a shell script to build the cluster. We can set up an infinite loop within which the backup server (node2) simply keeps pinging the primary server, if the result is unsuccessful, and then start the floating IP (192.168.1.4) as well as the necessary dæmons (programs running at the background).
You need two Pentium class PCs with a minimum specification of a 100MHz CPU, 32MB RAM, one NIC (network interface card), 1G hard drive. The two PCs need not be identical. In my experiment, I used an AMD K6 350M Hz and a Pentium 200 MMX. I chose the AMD as my primary server as it can complete a reboot (you need to do a few reboots for testing) faster than the Pentium 200. With the great support of CFSL (Computers for Schools and Libraries) in Winnipeg, I got some 4GB SCSI hard drives as well as some Adaptec 2940 PCI SCSI controllers. The old and almost obsolete equipment is in good working condition and is perfect for this experiment.
AMD K6 350MHz cpu
4G SCSI hard drive (you certainly can use IDE hard drive)
1.44 Floppy drive
24x CD-ROM (not needed after installation)
3COM 905 NIC
Pentium 200 MMX
4G SCSI hard drive
3COM 905 NIC
Both node1 and node2 must have Linux installed. I chose Red Hat and installed Red Hat 7.2 on node1 and Red Hat 6.2 on node2 (I simply wanted to find out if we could build a cluster with different versions of Linux installed on different nodes). Make sure you have installed all dæmons that you want to support. Here is my installation detail:
Hard disk partitions: 128MB for swap and the rest mounted for "/" (so that you don't need to worry about whether there is too much or not enough for a certain subdirectory).
Heartbeat is a part of Ultra Monkey (The Linux HA Project), and the RPM can be downloaded from www.UltraMonkey.org.
The download is small and RPM installation is smooth and simple. However, the document or HOWTO for configuration is hard to find and confusing. In fact, that is the reason I decided to write this HOWTO; so that hopefully you can get your cluster setup with less problems.
It is not the purpose of this article to show you how to install Red Hat; a lot of excellent documentation can be found at either www.linuxdoc.org or www.redhat.com. I will simply include some of the most important configuration files for your reference:
/etc/hosts 127.0.0.1 localhost 192.168.1.1 router 192.168.1.2 node1 192.168.1.3 node2
This file should be the same on both node1 and node2; you may add any other nodes as you see fit.
Check HOSTNAME (cat /etc/HOSTNAME) and make sure it returns either node1 or node2. If not, you can use this command (uname -n > /etc/HOSTNAME) to fix the hostname problem.
ifconfig for node1
eth0 Link encap:Ethernet HWaddr 00:60:97:9C:52:28 inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:18617 errors:0 dropped:0 overruns:0 frame:0 TX packets:14682 errors:0 dropped:0 overruns:0 carrier:0 collisions:3 txqueuelen:100 Interrupt:10 Base address:0x6800 eth0:0 Link encap:Ethernet HWaddr 00:60:97:9C:52:28 inet addr:192.168.1.4 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:10 Base address:0x6800 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:3924 Metric:1 RX packets:38 errors:0 dropped:0 overruns:0 frame:0 TX packets:38 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0
Please notice that eth0:0 shows the IP aliasing with IP 192.168.1.4.
ifconfig for node2
eth0 Link encap:Ethernet HWaddr 00:60:08:26:B2:A4 inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:15673 errors:0 dropped:0 overruns:0 frame:0 TX packets:17550 errors:0 dropped:0 overruns:0 carrier:0 collisions:2 txqueuelen:100 Interrupt:10 Base address:0x6700 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:3924 Metric:1 RX packets:142 errors:0 dropped:0 overruns:0 frame:0 TX packets:142 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0
If you are using Internet Explorer on Windows, you might have problems accessing FTP (Netscape works much better). I suggest you either use a command-line FTP or an FTP Windows/X Window System client (e.g., wu_ftp) to access the FTP site of Ultra Monkey (ftp.UltraMonkey.org).
Once you log in to the FTP server of Ultra Monkey, go to pub, then UltraMonkey and then the latest version 1.0.2 (not the beta). The only package is heartbeat-0.4.9-1.um.1.i386.rpm; save heartbeat-0.4.9-1.um.1.i386.rpm on your Linux box, log in as root and install it with
rpm -ivh heartbeat-0.4.9-1.um.1.i386.rpm
According to the accompanying documentation, you need to install a second NIC on both nodes and connect them with a cross overcable. Besides the second NIC, a null modem cable connecting the serial (com) ports of each node is mandatory (according to the documentation). I followed the instructions in the documentation and installed everything. However, as I did more tests on the cluster, I found that the null modem cable, crossover cable and the second NIC are optional; they are nice to have but definitely not mandatory.
Configuring Heartbeat is the most important part of the whole installation and must be set up correctly to get your cluster working. Moreover, it should be identical on both nodes. There are three configuration files, all stored under /etc/ha.d: ha.cf, haresource and aythkeys.
debugfile /var/log/ha-debug # # File to write other messages to # logfile /var/log/ha-log # # Facility to use for syslog()/logger # logfacility local0 # # keepalive: how many seconds between heartbeats # keepalive 2 # # deadtime: seconds-to-declare-host-dead # deadtime 10 udpport 694 # # What interfaces to heartbeat over? # udp eth0 # node atm1 node cluster1 # # ------> end of ha.cf
Whatever is not shown above, you can simply leave as it was (all commented out by the #). The last three options are most important:
udp eth0 # node atm1 node cluster1
Unless you have a cross cable, you should use your eth0 (your only NIC) for udp; the two nodes at the end of the above files must be the same as returned by uname -n from each node.
atm1 IPaddr::192.168.1.4 httpd smb dhcpd
This is the only line you need; in the above example, I included httpd, smb and dhcpd. You may add as many dæmons as you want, provided they have the exact same spelling as those dæmons under /etc/rc.d/init.d
You don't need to add anything to this file, but you have to issue the command
chmod 600 /etc/ha.d/authkeys
You may start the dæmon with
service heartbeat start
Once heartbeat is started on both nodes, you will find that the ifconfig from the primary server will return something like:
node1 ifconfig for node1 eth0 Link encap:Ethernet HWaddr 00:60:97:9C:52:28 inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:18617 errors:0 dropped:0 overruns:0 frame:0 TX packets:14682 errors:0 dropped:0 overruns:0 carrier:0 collisions:3 txqueuelen:100 Interrupt:10 Base address:0x6800 eth0:0 Link encap:Ethernet HWaddr 00:60:97:9C:52:28 inet addr:192.168.1.4 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:10 Base address:0x6800 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:3924 Metric:1 RX packets:38 errors:0 dropped:0 overruns:0 frame:0 TX packets:38 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0
When you see the line eth0:0, heartbeat is working, and you can try to access the server by using http://192.168.1.4 and check the log files /var/log/ha-log. Also, check the log file on node2 (192.168.1.3) and try
ps -A | grep dhcpd
and you should find no running dhcpd on node2.
Now, the real HA test. Reboot, and then shut down the primary server (node1: 192.168.1.2). Don't just power down the server; make sure you issue reboot or press CTL-ALT-DEL and wait until everything is shut down properly before you turn off your PC.
Within ten seconds, go to node2 and try ifconfig. If you can get the IP aliasing eth0:0, you are in business and have a working HA two-node cluster.
eth0 Link encap:Ethernet HWaddr 00:60:08:26:B2:A4 inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:15673 errors:0 dropped:0 overruns:0 frame:0 TX packets:17550 errors:0 dropped:0 overruns:0 carrier:0 collisions:2 txqueuelen:100 Interrupt:10 Base address:0x6700 eth0:0 Link encap:Ethernet HWaddr 00:60:08:26:B2:A4 inet addr:192.168.1.4 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:15673 errors:0 dropped:0 overruns:0 frame:0 TX packets:17550 errors:0 dropped:0 overruns:0 carrier:0 collisions:2 txqueuelen:100 Interrupt:10 Base address:0x6700
You can try
ps -A | grep dhcpd
or you can try to release and renew the IP info on your Win9x workstation, and you should see the new address for the dhcpd server.
Commercial products from Red Hat, TurboLinux and PolyServe use the same concept of IP aliasing. When the primary server goes down, the backup server will pick up the same aliasing IP so that high availability can be achieved.
The cluster product from PolyServe is very sophisticated. It has support on SAN (server area network) and is capable of more than two nodes. It is very easy to install and easy to configure. I successfully configured the trial version without reading any documentation through a windows monitoring client. However, sophistication comes with a price tag, and the software costs more than a thousand dollars for a two-node cluster. The 30-day trial version cluster will stop after two hours, and it is not much fun for testing.
The cluster product from TurboLinux needs some fine-tuning. The installation documentation is confusing (or maybe they simply don't want people to do-it-themselves). The web configuration tool is unstable; the cgi script will crash whenever the user clicks the reload or refresh button. And of course, as a commercial product, it comes with a high price tag.
Linux is very stable and reliable, and it is quite common to have our servers up and running for a few hundred days at a time. Heartbeat works fine in my tests, and if you are looking for a product with higher availability for a small business or education institution, Heartbeat is definitely a perfect option.