Getting Started with Heartbeat
In every work environment with which I have been involved, certain servers absolutely always must be up and running for the business to keep functioning smoothly. These servers provide services that always need to be available—whether it be a database, DHCP, DNS, file, Web, firewall or mail server.
A cornerstone of any service that always needs be up with no downtime is being able to transfer the service from one system to another gracefully. The magic that makes this happen on Linux is a service called Heartbeat. Heartbeat is the main product of the High-Availability Linux Project.
Heartbeat is very flexible and powerful. In this article, I touch on only basic active/passive clusters with two members, where the active server is providing the services and the passive server is waiting to take over if necessary.
Debian, Fedora, Gentoo, Mandriva, Red Flag, SUSE, Ubuntu and others have prebuilt packages in their repositories. Check your distribution's main and supplemental repositories for a package named heartbeat-2.
After installing a prebuilt package, you may see a “Heartbeat failure” message. This is normal. After the Heartbeat package is installed, the package manager is trying to start up the Heartbeat service. However, the service does not have a valid configuration yet, so the service fails to start and prints the error message.
You can install Heartbeat manually too. To get the most recent stable version, compiling from source may be necessary. There are a few dependencies, so to prepare on my Ubuntu systems, I first run the following command:
sudo apt-get build-dep heartbeat-2
Check the Linux-HA Web site for the complete list of dependencies. With the dependencies out of the way, download the latest source tarball and untar it. Use the ConfigureMe script to compile and install Heartbeat. This script makes educated guesses from looking at your environment as to how best to configure and install Heartbeat. It also does everything with one command, like so:
sudo ./ConfigureMe install
With any luck, you'll walk away for a few minutes, and when you return, Heartbeat will be compiled and installed on every node in your cluster.
Heartbeat has three main configuration files:
The authkeys file must be owned by root and be chmod 600. The actual format of the authkeys file is very simple; it's only two lines. There is an auth directive with an associated method ID number, and there is a line that has the authentication method and the key that go with the ID number of the auth directive. There are three supported authentication methods: crc, md5 and sha1. Listing 1 shows an example. You can have more than one authentication method ID, but this is useful only when you are changing authentication methods or keys. Make the key long—it will improve security and you don't have to type in the key ever again.
The next file to configure is the ha.cf file—the main Heartbeat configuration file. The contents of this file should be the same on all nodes with a couple of exceptions.
Heartbeat ships with a detailed example file in the documentation directory that is well worth a look. Also, when creating your ha.cf file, the order in which things appear matters. Don't move them around! Two different example ha.cf files are shown in Listings 2 and 3.
Listing 2. The /etc/ha.d/ha.cf File on Briggs & Stratton
keepalive 2 deadtime 32 warntime 16 initdead 64 baud 19200 # On briggs the serial device is /dev/ttyS1 # On stratton the serial device is /dev/ttyS0 serial /dev/ttyS1 auto_failback on node briggs node stratton use_logd yes
Listing 3. The /etc/ha.d/ha.cf File on Deimos & Phobos
keepalive 1 deadtime 10 warntime 5 udpport 694 # deimos' heartbeat ip address is 192.168.1.11 # phobos' heartbeat ip address is 192.168.1.21 ucast eth1 192.168.1.11 auto_failback off stonith_host deimos wti_nps ares.example.com erisIsTheKey stonith_host phobos wti_nps ares.example.com erisIsTheKey node deimos node phobos use_logd yes
The first thing you need to specify is the keepalive—the time between heartbeats in seconds. I generally like to have this set to one or two, but servers under heavy loads might not be able to send heartbeats in a timely manner. So, if you're seeing a lot of warnings about late heartbeats, try increasing the keepalive.
The deadtime is next. This is the time to wait without hearing from a cluster member before the surviving members of the array declare the problem host as being dead.
Next comes the warntime. This setting determines how long to wait before issuing a “late heartbeat” warning.
Sometimes, when all members of a cluster are booted at the same time, there is a significant length of time between when Heartbeat is started and before the network or serial interfaces are ready to send and receive heartbeats. The optional initdead directive takes care of this issue by setting an initial deadtime that applies only when Heartbeat is first started.
You can send heartbeats over serial or Ethernet links—either works fine. I like serial for two server clusters that are physically close together, but Ethernet works just as well. The configuration for serial ports is easy; simply specify the baud rate and then the serial device you are using. The serial device is one place where the ha.cf files on each node may differ due to the serial port having different names on each host. If you don't know the tty to which your serial port is assigned, run the following command:
setserial -g /dev/ttyS*
If anything in the output says “UART: unknown”, that device is not a real serial port. If you have several serial ports, experiment to find out which is the correct one.
If you decide to use Ethernet, you have several choices of how to configure things. For simple two-server clusters, ucast (uni-cast) or bcast (broadcast) work well.
The format of the ucast line is:
ucast <device> <peer-ip-address>
Here is an example:
ucast eth1 192.168.1.30
If I am using a crossover cable to connect two hosts together, I just broadcast the heartbeat out of the appropriate interface. Here is an example bcast line:
There is also a more complicated method called mcast. This method uses multicast to send out heartbeat messages. Check the Heartbeat documentation for full details.
Now that we have Heartbeat transportation all sorted out, we can define auto_failback. You can set auto_failback either to on or off. If set to on and the primary node fails, the secondary node will “failback” to its secondary standby state when the primary node returns. If set to off, when the primary node comes back, it will be the secondary.
It's a toss-up as to which one to use. My thinking is that so long as the servers are identical, if my primary node fails, then the secondary node becomes the primary, and when the prior primary comes back, it becomes the secondary. However, if my secondary server is not as powerful a machine as the primary, similar to how the spare tire in my car is not a “real” tire, I like the primary to become the primary again as soon as it comes back.
Moving on, when Heartbeat thinks a node is dead, that is just a best guess. The “dead” server may still be up. In some cases, if the “dead” server is still partially functional, the consequences are disastrous to the other node members. Sometimes, there's only one way to be sure whether a node is dead, and that is to kill it. This is where STONITH comes in.
STONITH stands for Shoot The Other Node In The Head. STONITH devices are commonly some sort of network power-control device. To see the full list of supported STONITH device types, use the stonith -L command, and use stonith -h to see how to configure them.
Next, in the ha.cf file, you need to list your nodes. List each one on its own line, like so:
node deimos node phobos
The name you use must match the output of uname -n.
The last entry in my example ha.cf files is to turn on logging:
There are many other options that can't be touched on here. Check the documentation for details.
The third configuration file is the haresources file. Before configuring it, you need to do some housecleaning. Namely, all services that you want Heartbeat to manage must be removed from the system init for all init levels.
On Debian-style distributions, the command is:
/usr/sbin/update-rc.d -f <service_name> remove
Check your distribution's documentation for how to do the same on your nodes.
Now, you can put the services into the haresources file. As with the other two configuration files for Heartbeat, this one probably won't be very large. Similar to the authkeys file, the haresources file must be exactly the same on every node. And, like the ha.cf file, position is very important in this file. When control is transferred to a node, the resources listed in the haresources file are started left to right, and when control is transfered to a different node, the resources are stopped right to left. Here's the basic format:
<node_name> <resource_1> <resource_2> <resource_3> . . .
The node_name is the node you want to be the primary on initial startup of the cluster, and if you turned on auto_failback, this server always will become the primary node whenever it is up. The node name must match the name of one of the nodes listed in the ha.cf file.
Resources are scripts located either in /etc/ha.d/resource.d/ or /etc/init.d/, and if you want to create your own resource scripts, they should conform to LSB-style init scripts like those found in /etc/init.d/. Some of the scripts in the resource.d folder can take arguments, which you can pass using a :: on the resource line. For example, the IPAddr script sets the cluster IP address, which you specify like so:
In the above example, the IPAddr resource is told to set up a cluster IP address of 192.168.1.9 with a 24-bit subnet mask (255.255.255.0) and to bind it to eth0. You can pass other options as well; check the example haresources file that ships with Heartbeat for more information.
Another common resource is Filesystem. This resource is for mounting shared filesystems. Here is an example:
The arguments to the Filesystem resource in the example above are, left to right, the device node (an ATA-over-Ethernet drive in this case), a mountpoint (/opt/data) and the filesystem type (xfs).
For regular init scripts in /etc/init.d/, simply enter them by name. As long as they can be started with start and stopped with stop, there is a good chance that they will work.
Listings 4 and 5 are haresources files for two of the clusters I run. They are paired with the ha.cf files in Listings 2 and 3, respectively.
Listing 5. A More Substantial haresources File
deimos \ IPaddr::192.168.12.1 \ Filesystem::/dev/etherd/e1.0::/opt/storage::xfs \ killnfsd \ nfs-common \ nfs-kernel-server
The cluster defined in Listings 2 and 4 is very simple, and it has only two resources—a cluster IP address and the Apache 2 Web server. I use this for my personal home Web server cluster. The servers themselves are nothing special—an old PIII tower and a cast-off laptop. The content on the servers is static HTML, and the content is kept in sync with an hourly rsync cron job. I don't trust either “server” very much, but with Heartbeat, I have never had an outage longer than half a second—not bad for two old castaways.
The cluster defined in Listings 3 and 5 is a bit more complicated. This is the NFS cluster I administer at work. This cluster utilizes shared storage in the form of a pair of Coraid SR1521 ATA-over-Ethernet drive arrays, two NFS appliances (also from Coraid) and a STONITH device. STONITH is important for this cluster, because in the event of a failure, I need to be sure that the other device is really dead before mounting the shared storage on the other node. There are five resources managed in this cluster, and to keep the line in haresources from getting too long to be readable, I break it up with line-continuation slashes. If the primary cluster member is having trouble, the secondary cluster kills the primary, takes over the IP address, mounts the shared storage and then starts up NFS. With this cluster, instead of having maintenance issues or other outages lasting several minutes to an hour (or more), outages now don't last beyond a second or two. I can live with that.
Now that your cluster is all configured, start it with:
Things might work perfectly or not at all. Fortunately, with logging enabled, troubleshooting is easy, because Heartbeat outputs informative log messages. Heartbeat even will let you know when a previous log message is not something you have to worry about. When bringing a new cluster on-line, I usually open an SSH terminal to each cluster member and tail the messages file like so:
tail -f /var/log/messages
Then, in separate terminals, I start up Heartbeat. If there are any problems, it is usually pretty easy to spot them.
Heartbeat also comes with very good documentation. Whenever I run into problems, this documentation has been invaluable. On my system, it is located under the /usr/share/doc/ directory.
I've barely scratched the surface of Heartbeat's capabilities here. Fortunately, a lot of resources exist to help you learn about Heartbeat's more-advanced features. These include active/passive and active/active clusters with N number of nodes, DRBD, the Cluster Resource Manager and more. Now that your feet are wet, hopefully you won't be quite as intimidated as I was when I first started learning about Heartbeat. Be careful though, or you might end up like me and want to cluster everything.
The High-Availability Linux Project: www.linux-ha.org
Heartbeat Home Page: www.linux-ha.org/Heartbeat
Getting Started with Heartbeat Version 2: www.linux-ha.org/GettingStartedV2
An Introductory Heartbeat Screencast: linux-ha.org/Education/Newbie/InstallHeartbeatScreencast
The Linux-HA Mailing List: lists.linux-ha.org/mailman/listinfo/linux-ha
Daniel Bartholomew has been using computers since the early 1980s when his parents purchased an Apple IIe. After stints on Mac and Windows machines, he discovered Linux in 1996 and has been using various distributions ever since. He lives with his wife and children in North Carolina.