High Availability Cluster Checklist
In today's competitive environment, the adage “time is money” takes on literal meaning. Keeping your business' data on-line and accessible is the foundation of overall system uptime. Whether it be database back ends, web servers or network file systems (NFS) used as e-mail and user directories, outages in your data storage tier can be catastrophic.
The most cost-effective approach to increasing your site's overall reliability is to implement a fail-over cluster. Fail-over clusters involve pooling together multiple computers, each of which is a candidate server for your file systems, databases or applications. Each of these systems monitors the health of other systems in the cluster. In the event of failure in one of the cluster members, the others take over the services of the failed node. The takeover is typically performed in such a way as to make it transparent to the client systems that are accessing the data.
A typical fail-over cluster implementation consists of multiple systems attached to a set of shared storage units, such as disks, connected to a shared SCSI or FibreChannel bus. Each of the cluster members usually monitors the health of others via network (e.g., Ethernet) and/or point-to-point serial connections. Historically, enterprise-quality cluster offerings were the domain of proprietary vendors such as Digital, HP or IBM. Recently, viable Linux-based cluster offerings that run on commodity hardware have become available.
A quick perusal on the Web will uncover a range of Linux-based clustering alternatives. The majority of them look great on paper. They will tout amazingly quick fail-over times for large number of services on clusters consisting of any number of nodes. It is easy to fall into the trap of purchasing the wrong cluster product. The truth is that not all high-availability clustering alternatives safely increase the reliability and availability of your data. Rather, choosing the wrong type of product can leave your valuable file systems and databases vulnerable to corruption. Some products neglect to mention this fact; others only will state this fact if you dig deep under the hood in related white papers.
Being in the UNIX/Linux high-availability business for more than seven years, I have seen cluster products come and go. It's unnerving to see cluster products promoted for jobs they are ill-equipped to perform. Risking end-user data to corruption gives the whole cluster scene a bad name. I have culled through years of investigation to create a simple four-point checklist that serves as a guide for evaluating whether a high-availability cluster product matches your needs. In fact, these points are not particular to UNIX or Linux; they apply across any hardware and operating system implementation. So before dedicating any money (and your company's data) to a high-availability cluster solution, be sure you know how the solution protects you from the following four failure scenarios:
Planned maintenance and shutdown
We will be discussing each of these points in detail and pointing out typical pitfalls. But before getting into the analysis of these four points, it is crucial to have an understanding of what data integrity is all about. The fundamental point of data integrity is knowing that your data is accurate and up-to-date. Sounds simple enough. In a cluster environment, preserving the integrity of the data is of paramount importance and supersedes even data availability.
Using examples helps to illustrate the point. The diagram in Figure 1 depicts a two-node cluster (I am using a two-node cluster for simplicity, the concepts apply to clusters composed of more than two nodes as well) with cluster members A and B connected to a shared SCSI bus with Disk 1.
Typical operating systems provide access to disk-based storage via file systems that, in turn, access disk storage. Commonly, the file system mounts the disk volume and then accommodates user access. In the interests of performance, file system implementations typically cache recent copies of file system data in memory. Consequently, the most up-to-date version of your data (being served by node A) is actually the combination of what is cached in system A's memory plus the on-disk data.
Now extend this example to the other cluster member (node B). If node B were to mount the same file system and access it, the true contents of your file system would now consist of the data being cached on node A's memory, plus the data being cached in node B's memory, plus the on-disk data. Making this work correctly requires implementing a file system that coordinates the in-memory cached data of multiple systems in addition to the on-disk data. Such a model, where all cluster members can concurrently mount the same file system, is referred to as a cluster file system. Few UNIX offerings implement a cluster file system and no Linux variants implement a production-ready cluster file system today (although efforts are underway, see the GFS project http://www.gfs.lcse.umn.edu/).
In the absence of a cluster file system, what happens if multiple cluster members concurrently access the same file system? Possible outcomes include:
Inaccurate data—suppose your trip to Las Vegas went particularly well, and you have $100 to deposit into your bank account. Consider that the deposit transaction was handled by node A, and it added the $100 to your prior balance of $25 resulting in a grand total of $125; node A then keeps your most recent balance in its memory resident cache. You then take a flight home and realize you need to withdraw $50 to get your car out of the parking garage. This transaction is now being handled by node B, which goes to the disk and retrieves your balance of $25 and bounces you out for insufficient funds! All this transpired because the true balance of $125 is cached in node A's memory. When it comes to a cluster implementation you need to answer this question: How damaging would it be to your company if the wrong data were supplied?
System crash—in addition to storing user data, such as an account balance, file systems also store their own metadata on disk that describes how user data is organized (consider it an index or table of contents). For performance reasons, metadata is also cached in memory. File systems get particularly confused and upset if their metadata becomes scrambled and often resort to temper tantrums (better known as system crashes or panics). In the absence of a true cluster file system, if you ever have more than one cluster member concurrently mounting the same file system, it will result in each node having a differing idea of what the metadata represents, usually resulting in a system crash.
When a file system's data or metadata becomes scrambled, data corruption ensues. To correct a data corruption problem typically means restoring from a tape backup (you do this regularly, right?). The problem here is that since the backup frequency is low in relation to transaction rate, the time it takes to recover from data corruption is often measured in days rather than the small number of minutes or seconds you expected from deploying a high-availability cluster.
The above concepts about requiring cluster members to synchronize their access to file system data to protect against data corruption also apply to databases. Most database implementations do not allow multiple cluster members to concurrently serve the same underlying disk data. Notable exceptions to this include Oracle Parallel Server (currently being ported to Linux) and Informix Extended Parallel Server.
The upshot of all this is that the cluster implementation you choose must ensure that an individual file system or database can only be served by a single cluster member at any point in time—pretty simple, if you can find a cluster product that does this in all cases. Now, let us proceed to how this holds up under the four scenarios mentioned earlier.
- Transitioning to Python 3
- Red Hat OpenStack Platform
- Stepping into Science
- Tech Tip: Really Simple HTTP Server with Python
- Linux Journal December 2016
- CORSAIR's Carbide Air 740
- Radio Free Linux
- A Better Raspberry Pi Streaming Solution
- The Tiny Internet Project, Part II
- Returning Values from Bash Functions