High Availability Cluster Checklist

With a variety of clustering services on the market, the ability to determine how well options meet your specific business needs is necessary.

In today's competitive environment, the adage “time is money” takes on literal meaning. Keeping your business' data on-line and accessible is the foundation of overall system uptime. Whether it be database back ends, web servers or network file systems (NFS) used as e-mail and user directories, outages in your data storage tier can be catastrophic.

The most cost-effective approach to increasing your site's overall reliability is to implement a fail-over cluster. Fail-over clusters involve pooling together multiple computers, each of which is a candidate server for your file systems, databases or applications. Each of these systems monitors the health of other systems in the cluster. In the event of failure in one of the cluster members, the others take over the services of the failed node. The takeover is typically performed in such a way as to make it transparent to the client systems that are accessing the data.

A typical fail-over cluster implementation consists of multiple systems attached to a set of shared storage units, such as disks, connected to a shared SCSI or FibreChannel bus. Each of the cluster members usually monitors the health of others via network (e.g., Ethernet) and/or point-to-point serial connections. Historically, enterprise-quality cluster offerings were the domain of proprietary vendors such as Digital, HP or IBM. Recently, viable Linux-based cluster offerings that run on commodity hardware have become available.

A quick perusal on the Web will uncover a range of Linux-based clustering alternatives. The majority of them look great on paper. They will tout amazingly quick fail-over times for large number of services on clusters consisting of any number of nodes. It is easy to fall into the trap of purchasing the wrong cluster product. The truth is that not all high-availability clustering alternatives safely increase the reliability and availability of your data. Rather, choosing the wrong type of product can leave your valuable file systems and databases vulnerable to corruption. Some products neglect to mention this fact; others only will state this fact if you dig deep under the hood in related white papers.

Being in the UNIX/Linux high-availability business for more than seven years, I have seen cluster products come and go. It's unnerving to see cluster products promoted for jobs they are ill-equipped to perform. Risking end-user data to corruption gives the whole cluster scene a bad name. I have culled through years of investigation to create a simple four-point checklist that serves as a guide for evaluating whether a high-availability cluster product matches your needs. In fact, these points are not particular to UNIX or Linux; they apply across any hardware and operating system implementation. So before dedicating any money (and your company's data) to a high-availability cluster solution, be sure you know how the solution protects you from the following four failure scenarios:

  1. Planned maintenance and shutdown

  2. System crash

  3. Communication failure

  4. System hang

We will be discussing each of these points in detail and pointing out typical pitfalls. But before getting into the analysis of these four points, it is crucial to have an understanding of what data integrity is all about. The fundamental point of data integrity is knowing that your data is accurate and up-to-date. Sounds simple enough. In a cluster environment, preserving the integrity of the data is of paramount importance and supersedes even data availability.

Using examples helps to illustrate the point. The diagram in Figure 1 depicts a two-node cluster (I am using a two-node cluster for simplicity, the concepts apply to clusters composed of more than two nodes as well) with cluster members A and B connected to a shared SCSI bus with Disk 1.

Figure 1. Two Node Cluster with a Shared SCSI Bus

Typical operating systems provide access to disk-based storage via file systems that, in turn, access disk storage. Commonly, the file system mounts the disk volume and then accommodates user access. In the interests of performance, file system implementations typically cache recent copies of file system data in memory. Consequently, the most up-to-date version of your data (being served by node A) is actually the combination of what is cached in system A's memory plus the on-disk data.

Now extend this example to the other cluster member (node B). If node B were to mount the same file system and access it, the true contents of your file system would now consist of the data being cached on node A's memory, plus the data being cached in node B's memory, plus the on-disk data. Making this work correctly requires implementing a file system that coordinates the in-memory cached data of multiple systems in addition to the on-disk data. Such a model, where all cluster members can concurrently mount the same file system, is referred to as a cluster file system. Few UNIX offerings implement a cluster file system and no Linux variants implement a production-ready cluster file system today (although efforts are underway, see the GFS project http://www.gfs.lcse.umn.edu/).

In the absence of a cluster file system, what happens if multiple cluster members concurrently access the same file system? Possible outcomes include:

  • Inaccurate data—suppose your trip to Las Vegas went particularly well, and you have $100 to deposit into your bank account. Consider that the deposit transaction was handled by node A, and it added the $100 to your prior balance of $25 resulting in a grand total of $125; node A then keeps your most recent balance in its memory resident cache. You then take a flight home and realize you need to withdraw $50 to get your car out of the parking garage. This transaction is now being handled by node B, which goes to the disk and retrieves your balance of $25 and bounces you out for insufficient funds! All this transpired because the true balance of $125 is cached in node A's memory. When it comes to a cluster implementation you need to answer this question: How damaging would it be to your company if the wrong data were supplied?

  • System crash—in addition to storing user data, such as an account balance, file systems also store their own metadata on disk that describes how user data is organized (consider it an index or table of contents). For performance reasons, metadata is also cached in memory. File systems get particularly confused and upset if their metadata becomes scrambled and often resort to temper tantrums (better known as system crashes or panics). In the absence of a true cluster file system, if you ever have more than one cluster member concurrently mounting the same file system, it will result in each node having a differing idea of what the metadata represents, usually resulting in a system crash.

When a file system's data or metadata becomes scrambled, data corruption ensues. To correct a data corruption problem typically means restoring from a tape backup (you do this regularly, right?). The problem here is that since the backup frequency is low in relation to transaction rate, the time it takes to recover from data corruption is often measured in days rather than the small number of minutes or seconds you expected from deploying a high-availability cluster.

The above concepts about requiring cluster members to synchronize their access to file system data to protect against data corruption also apply to databases. Most database implementations do not allow multiple cluster members to concurrently serve the same underlying disk data. Notable exceptions to this include Oracle Parallel Server (currently being ported to Linux) and Informix Extended Parallel Server.

The upshot of all this is that the cluster implementation you choose must ensure that an individual file system or database can only be served by a single cluster member at any point in time—pretty simple, if you can find a cluster product that does this in all cases. Now, let us proceed to how this holds up under the four scenarios mentioned earlier.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

High-availability clusters

Emil Koutanov's picture

The problem with using Linux-based (or an OS-specific) clustering software is that you'll always be tied to the operating system.

The folks at Obsidian Dynamics have built a Java-based application-level clustering solution that isn't tied to the operating system.

I think this is the way forward, particularly seeing that many organisations are running a mixed bag of Windows and Linux servers - being able to cluster Windows and Linux machines together can be a real advantage. It also makes installation and configuration easier, since you're not supporting a dozen different operating systems and hardware configurations.

The other neat thing about Gridlock is that it doesn't use quorum and doesn't rely on NIC bonding/teaming to achieve multipath configurations - instead it combines redundant networks at the application level, which means it works on any network card and doesn't require specialised switchgear.

Re: High Availability Cluster Checklist

Anonymous's picture

http://www.gfs.lcse.umn.edu/ doesn't work.
Same as that tricodr.com in comments

Re: High Availability Cluster Checklist

Anonymous's picture

For a real clustered file system with FT and Failover

checkout www.tricord.com! Illumina an Lunar Flare familly.

It's highly scalable and offers great performance.