Building a Two-Node Linux Cluster with Heartbeat

C T shows you how to set up a two-node Linux cluster with Heartbeat.
Null Modem Cable, Crossover Cable, Second NIC

According to the accompanying documentation, you need to install a second NIC on both nodes and connect them with a cross overcable. Besides the second NIC, a null modem cable connecting the serial (com) ports of each node is mandatory (according to the documentation). I followed the instructions in the documentation and installed everything. However, as I did more tests on the cluster, I found that the null modem cable, crossover cable and the second NIC are optional; they are nice to have but definitely not mandatory.

Configuring Heartbeat is the most important part of the whole installation and must be set up correctly to get your cluster working. Moreover, it should be identical on both nodes. There are three configuration files, all stored under /etc/ha.d: ha.cf, haresource and aythkeys.

My /etc/ha.d/ha.cf

debugfile /var/log/ha-debug
#
#       File to write other messages to
#
logfile /var/log/ha-log
#
#       Facility to use for syslog()/logger 
#
logfacility     local0
#
#       keepalive: how many seconds between heartbeats
#
keepalive 2
#
#       deadtime: seconds-to-declare-host-dead
#
deadtime 10
udpport 694
#
#       What interfaces to heartbeat over?
#
udp     eth0
#
node    atm1
node    cluster1
#
# ------> end of ha.cf

Whatever is not shown above, you can simply leave as it was (all commented out by the #). The last three options are most important:

udp     eth0
#
node    atm1
node    cluster1

Unless you have a cross cable, you should use your eth0 (your only NIC) for udp; the two nodes at the end of the above files must be the same as returned by uname -n from each node.

My /etc/ha.d/haresources

atm1 IPaddr::192.168.1.4 httpd smb dhcpd

This is the only line you need; in the above example, I included httpd, smb and dhcpd. You may add as many dæmons as you want, provided they have the exact same spelling as those dæmons under /etc/rc.d/init.d

My /etc/ha.d/authkeys

You don't need to add anything to this file, but you have to issue the command

chmod 600 /etc/ha.d/authkeys

Start the Heartbeat Daemon

You may start the dæmon with

service heartbeat start

or

/etc/rc.d/init.d/heartbeat start

Once heartbeat is started on both nodes, you will find that the ifconfig from the primary server will return something like:

node1
ifconfig for node1
eth0      Link encap:Ethernet  HWaddr 00:60:97:9C:52:28  
          inet addr:192.168.1.2  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:18617 errors:0 dropped:0 overruns:0 frame:0
          TX packets:14682 errors:0 dropped:0 overruns:0 carrier:0
          collisions:3 txqueuelen:100 
          Interrupt:10 Base address:0x6800 
eth0:0    Link encap:Ethernet  HWaddr 00:60:97:9C:52:28  
          inet addr:192.168.1.4  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:10 Base address:0x6800 
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:3924  Metric:1
          RX packets:38 errors:0 dropped:0 overruns:0 frame:0
          TX packets:38 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0

When you see the line eth0:0, heartbeat is working, and you can try to access the server by using http://192.168.1.4 and check the log files /var/log/ha-log. Also, check the log file on node2 (192.168.1.3) and try

ps -A | grep dhcpd

and you should find no running dhcpd on node2.

Now, the real HA test. Reboot, and then shut down the primary server (node1: 192.168.1.2). Don't just power down the server; make sure you issue reboot or press CTL-ALT-DEL and wait until everything is shut down properly before you turn off your PC.

Within ten seconds, go to node2 and try ifconfig. If you can get the IP aliasing eth0:0, you are in business and have a working HA two-node cluster.

eth0      Link encap:Ethernet  HWaddr 00:60:08:26:B2:A4  
          inet addr:192.168.1.2  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:15673 errors:0 dropped:0 overruns:0 frame:0
          TX packets:17550 errors:0 dropped:0 overruns:0 carrier:0
          collisions:2 txqueuelen:100 
          Interrupt:10 Base address:0x6700 
eth0:0    Link encap:Ethernet  HWaddr 00:60:08:26:B2:A4  
          inet addr:192.168.1.4  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:15673 errors:0 dropped:0 overruns:0 frame:0
          TX packets:17550 errors:0 dropped:0 overruns:0 carrier:0
          collisions:2 txqueuelen:100 
          Interrupt:10 Base address:0x6700 

You can try

ps -A | grep dhcpd

or you can try to release and renew the IP info on your Win9x workstation, and you should see the new address for the dhcpd server.

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix