High Availability Cluster Checklist

With a variety of clustering services on the market, the ability to determine how well options meet your specific business needs is necessary.
Planned Maintenance

One of the greatest benefits of a high-availability cluster, which is ironically overlooked, is the ability to cleanly migrate services off a cluster member so you can perform routine maintenance without disrupting service to client systems. For example, this allows you to upgrade your software to the latest release or add memory to your system while keeping your site operational. Virtually all high-availability cluster offerings accommodate planned maintenance.

System Crash

If you believe that a particular operating system is crash proof, give me a call and I'll sell you the Brooklyn Bridge to go along with that OS. Let's face it, system crashes are facts of life; it is merely a matter of minimizing their frequency. In response to a system crash, the other cluster members will conclude that a server has become nonresponsive and commence a take over of the services formerly provided by the failed node.

In the event of a system crash, virtually all fail-over cluster implementations will correctly takeover the services of a failed node. So far so good—it looks like just about any fail-over cluster product will suit you. Not so fast; the following points separate the credible offerings from the not so credible.

Communication Failure

Typical high-availability cluster implementations consist of a set of cluster members, each monitoring the other's health over a variety of “cluster interconnects”. Historically, many proprietary cluster vendors have depended on custom hardware for their cluster interconnects. While this provides a solid cluster implementation, by nature it tends to be very expensive and locks you into a single vendor. To provide a cost-effective alternative, other cluster implementations monitor system health over commonly available network interconnects (commonly Ethernet) and serial port connections. In these configurations, the cluster members periodically exchange messages, and based on the response (or lack thereof) conclude whether the other members are up or down. This exchange of system health-monitoring messages is commonly referred to as a “heartbeat”.

A common problem with “heartbeat” based clusters is communication partitions. This is when cluster members (or a set of members) are up but are unable to communicate with one another. Take, for example, the diagram in Figure 2 depicting a two-node cluster with an Ethernet and Serial connection between the nodes over which heartbeat messages are exchanged.

Figure 2. Two-Node Clustet with Ethernet and Serial Connections

Let us suppose you had set up your high-availability cluster and gone off to Las Vegas for the weekend, lulled into complacency with your company's new on-line ordering system deployed in this configuration. Further imagine the cleaning person accidentally knocking out the Ethernet connection with a broom. Now your two cluster members' cluster software running on each node must decide how to respond to this scenario in the interest of preserving high availability. Since the members can't communicate, they have to make the call in isolation. Here's some policy options commonly used by some cluster products:

  • Pessimistic assumption—Node A knows that it's serving the database but is unaware of node B's state, so node A continues to serve the database. Node B can't communicate with node A and assumes that node A is down. Node B then commences serving the database resulting in two cluster members serving the same database further resulting in database corruption and possibly a system crash. (As weak as this sounds, this policy is employed in some offerings!)

  • Optimistic assumption—After a site wide power outage, node A and node B both boot up at the same time. Neither node can ascertain the state of the other node and, just to be safe, they each assume that the other node is up so they do not start serving the database (to avoid data corruption). This results in a scenario where neither cluster member is serving the database. So much for spending money for a redundant cluster server! Actually, you're better off having your database unavailable than to have it corrupted. There are other failure scenarios that manifest themselves as a communication failure. For example:

  • An Ethernet adapter fails

  • The systems are connected to a common hub or switch that fails

  • The Ethernet cable fails

To avoid these forms of communication partition, a common clustering practice is to employ multiple communication interconnects. For example, you may have the systems monitor each other's health by heartbeating over multiple Ethernets or a combination of both Ethernet and serial connections. Similarly, you may have each of the network connections go through separate hubs/switches or be point-to-point links.

Most cluster implementations allow you to configure multiple communication interconnects to eliminate the possibility of a communication partition. (If they do not, you should probably quickly move on to another vendor.)



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

High-availability clusters

Emil Koutanov's picture

The problem with using Linux-based (or an OS-specific) clustering software is that you'll always be tied to the operating system.

The folks at Obsidian Dynamics have built a Java-based application-level clustering solution that isn't tied to the operating system.

I think this is the way forward, particularly seeing that many organisations are running a mixed bag of Windows and Linux servers - being able to cluster Windows and Linux machines together can be a real advantage. It also makes installation and configuration easier, since you're not supporting a dozen different operating systems and hardware configurations.

The other neat thing about Gridlock is that it doesn't use quorum and doesn't rely on NIC bonding/teaming to achieve multipath configurations - instead it combines redundant networks at the application level, which means it works on any network card and doesn't require specialised switchgear.

Re: High Availability Cluster Checklist

Anonymous's picture

http://www.gfs.lcse.umn.edu/ doesn't work.
Same as that tricodr.com in comments

Re: High Availability Cluster Checklist

Anonymous's picture

For a real clustered file system with FT and Failover

checkout www.tricord.com! Illumina an Lunar Flare familly.

It's highly scalable and offers great performance.