High Availability Cluster Checklist
In today's competitive environment, the adage “time is money” takes on literal meaning. Keeping your business' data on-line and accessible is the foundation of overall system uptime. Whether it be database back ends, web servers or network file systems (NFS) used as e-mail and user directories, outages in your data storage tier can be catastrophic.
The most cost-effective approach to increasing your site's overall reliability is to implement a fail-over cluster. Fail-over clusters involve pooling together multiple computers, each of which is a candidate server for your file systems, databases or applications. Each of these systems monitors the health of other systems in the cluster. In the event of failure in one of the cluster members, the others take over the services of the failed node. The takeover is typically performed in such a way as to make it transparent to the client systems that are accessing the data.
A typical fail-over cluster implementation consists of multiple systems attached to a set of shared storage units, such as disks, connected to a shared SCSI or FibreChannel bus. Each of the cluster members usually monitors the health of others via network (e.g., Ethernet) and/or point-to-point serial connections. Historically, enterprise-quality cluster offerings were the domain of proprietary vendors such as Digital, HP or IBM. Recently, viable Linux-based cluster offerings that run on commodity hardware have become available.
A quick perusal on the Web will uncover a range of Linux-based clustering alternatives. The majority of them look great on paper. They will tout amazingly quick fail-over times for large number of services on clusters consisting of any number of nodes. It is easy to fall into the trap of purchasing the wrong cluster product. The truth is that not all high-availability clustering alternatives safely increase the reliability and availability of your data. Rather, choosing the wrong type of product can leave your valuable file systems and databases vulnerable to corruption. Some products neglect to mention this fact; others only will state this fact if you dig deep under the hood in related white papers.
Being in the UNIX/Linux high-availability business for more than seven years, I have seen cluster products come and go. It's unnerving to see cluster products promoted for jobs they are ill-equipped to perform. Risking end-user data to corruption gives the whole cluster scene a bad name. I have culled through years of investigation to create a simple four-point checklist that serves as a guide for evaluating whether a high-availability cluster product matches your needs. In fact, these points are not particular to UNIX or Linux; they apply across any hardware and operating system implementation. So before dedicating any money (and your company's data) to a high-availability cluster solution, be sure you know how the solution protects you from the following four failure scenarios:
Planned maintenance and shutdown
System crash
Communication failure
System hang
We will be discussing each of these points in detail and pointing out typical pitfalls. But before getting into the analysis of these four points, it is crucial to have an understanding of what data integrity is all about. The fundamental point of data integrity is knowing that your data is accurate and up-to-date. Sounds simple enough. In a cluster environment, preserving the integrity of the data is of paramount importance and supersedes even data availability.
Using examples helps to illustrate the point. The diagram in Figure 1 depicts a two-node cluster (I am using a two-node cluster for simplicity, the concepts apply to clusters composed of more than two nodes as well) with cluster members A and B connected to a shared SCSI bus with Disk 1.

Figure 1. Two Node Cluster with a Shared SCSI Bus
Typical operating systems provide access to disk-based storage via file systems that, in turn, access disk storage. Commonly, the file system mounts the disk volume and then accommodates user access. In the interests of performance, file system implementations typically cache recent copies of file system data in memory. Consequently, the most up-to-date version of your data (being served by node A) is actually the combination of what is cached in system A's memory plus the on-disk data.
Now extend this example to the other cluster member (node B). If node B were to mount the same file system and access it, the true contents of your file system would now consist of the data being cached on node A's memory, plus the data being cached in node B's memory, plus the on-disk data. Making this work correctly requires implementing a file system that coordinates the in-memory cached data of multiple systems in addition to the on-disk data. Such a model, where all cluster members can concurrently mount the same file system, is referred to as a cluster file system. Few UNIX offerings implement a cluster file system and no Linux variants implement a production-ready cluster file system today (although efforts are underway, see the GFS project http://www.gfs.lcse.umn.edu/).
In the absence of a cluster file system, what happens if multiple cluster members concurrently access the same file system? Possible outcomes include:
Inaccurate data—suppose your trip to Las Vegas went particularly well, and you have $100 to deposit into your bank account. Consider that the deposit transaction was handled by node A, and it added the $100 to your prior balance of $25 resulting in a grand total of $125; node A then keeps your most recent balance in its memory resident cache. You then take a flight home and realize you need to withdraw $50 to get your car out of the parking garage. This transaction is now being handled by node B, which goes to the disk and retrieves your balance of $25 and bounces you out for insufficient funds! All this transpired because the true balance of $125 is cached in node A's memory. When it comes to a cluster implementation you need to answer this question: How damaging would it be to your company if the wrong data were supplied?
System crash—in addition to storing user data, such as an account balance, file systems also store their own metadata on disk that describes how user data is organized (consider it an index or table of contents). For performance reasons, metadata is also cached in memory. File systems get particularly confused and upset if their metadata becomes scrambled and often resort to temper tantrums (better known as system crashes or panics). In the absence of a true cluster file system, if you ever have more than one cluster member concurrently mounting the same file system, it will result in each node having a differing idea of what the metadata represents, usually resulting in a system crash.
When a file system's data or metadata becomes scrambled, data corruption ensues. To correct a data corruption problem typically means restoring from a tape backup (you do this regularly, right?). The problem here is that since the backup frequency is low in relation to transaction rate, the time it takes to recover from data corruption is often measured in days rather than the small number of minutes or seconds you expected from deploying a high-availability cluster.
The above concepts about requiring cluster members to synchronize their access to file system data to protect against data corruption also apply to databases. Most database implementations do not allow multiple cluster members to concurrently serve the same underlying disk data. Notable exceptions to this include Oracle Parallel Server (currently being ported to Linux) and Informix Extended Parallel Server.
The upshot of all this is that the cluster implementation you choose must ensure that an individual file system or database can only be served by a single cluster member at any point in time—pretty simple, if you can find a cluster product that does this in all cases. Now, let us proceed to how this holds up under the four scenarios mentioned earlier.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- RSS Feeds
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- Developer Poll
- Dart: a New Web Programming Experience
- What's the tweeting protocol?
- New Products
- Web Hosting IQ
1 hour 4 min ago - Thanks for taking the time to
2 hours 41 min ago - Linux is good
4 hours 39 min ago - Reply to comment | Linux Journal
4 hours 56 min ago - Web Hosting IQ
5 hours 26 min ago - Web Hosting IQ
5 hours 26 min ago - Web Hosting IQ
5 hours 27 min ago - Reply to comment | Linux Journal
8 hours 28 min ago - play with linux? i think you mean work-around linux
16 hours 54 min ago - Where is Epistle?
17 hours 5 sec ago
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.




Comments
High-availability clusters
The problem with using Linux-based (or an OS-specific) clustering software is that you'll always be tied to the operating system.
The folks at Obsidian Dynamics have built a Java-based application-level clustering solution that isn't tied to the operating system.
(www.obsidiandynamics.com/gridlock)
I think this is the way forward, particularly seeing that many organisations are running a mixed bag of Windows and Linux servers - being able to cluster Windows and Linux machines together can be a real advantage. It also makes installation and configuration easier, since you're not supporting a dozen different operating systems and hardware configurations.
The other neat thing about Gridlock is that it doesn't use quorum and doesn't rely on NIC bonding/teaming to achieve multipath configurations - instead it combines redundant networks at the application level, which means it works on any network card and doesn't require specialised switchgear.
Re: High Availability Cluster Checklist
http://www.gfs.lcse.umn.edu/ doesn't work.
Same as that tricodr.com in comments
Re: High Availability Cluster Checklist
For a real clustered file system with FT and Failover
checkout www.tricord.com! Illumina an Lunar Flare familly.
It's highly scalable and offers great performance.
elhaddi@cs.umn.edu