A High-Availability Cluster for Linux
Although Linux is known to be an extremely stable operating system, the fact that the standard PC hardware is not quite so reliable must not be overlooked. I have been maintaining Linux servers for a long time, and in most cases when a system has failed, it has been due to server hardware failure. UNIX in the commercial world is known for having good clustering and high-availability (HA) technologies.
In my present company, Electec, we rely heavily upon e-mail (Sendmail and IMAP4), Windows file sharing (Samba), FTP and dial-up authentication (radius) services on a 24-hour basis for communication with our suppliers, staff and customers who are located in different time zones. Until recently, all of these services were consolidated on one Linux server. This system had served us very well. However, it was just a matter of time before a hardware failure occurred, which would cause us loss of productivity and revenue.
High availability is becoming increasingly important as we depend more and more on computers in business. I decided to design and implement an inexpensive high-availability solution for our business-critical needs without requiring the use of expensive additional hardware or software. This article covers the design aspects, pitfalls and implementation experiences of the solution.
Quite a few different approaches and combinations of approaches exist for high availability on servers. One way is to use a single, fault-tolerant server with redundant power supplies, RAID, environmental monitoring, fans, network interface cards and so on. The other way involves the use of several units of non-redundant hardware arranged in a cluster, so that each node (or server) in the cluster is able to take over from any failures of partner nodes. The fault-tolerant server approach has the advantage that operating system, application configurations and operations are the same as if you were using a simple, inexpensive server. With a cluster, the application and OS configurations can become very complex and much advanced planning is needed.
With a fault-tolerant server, failures are taken care of in such a way that clients do not notice any downtime—recovery is seamless. Ideally, this should also be the case with node failures in a cluster. In many cases, the hardware cost of a cluster is far less than that of a single fault-tolerant server, especially when you do not have to spend a great deal of money for one of the commercial cluster software offerings.
A trade-off is present between cost and client disruption/downtime. You must ask yourself how much downtime you and your users can tolerate. Shorter downtimes usually require a much more complex or costly solution. In our case, I decided we could live with approximately five minutes downtime in the event of a failure; therefore, I chose to use a cluster as opposed to a single fault-tolerant server.
Many clustering solutions are available to the UNIX market, which can provide almost zero downtime in the event of a node takeover by means of session and network connection takeover. These solutions are mostly expensive and normally require the use of external shared storage hardware. In our case, we can allow for sessions and connections to be lost. This simplifies the task of implementing a high-availability cluster.
When implementing HA clustering, some recommend that a shared storage device, such as a dual-port RAID box, be used. However, it is possible to approach the problem by using a separate storage device for each node in the cluster and mirroring the storage devices when necessary. Each avenue has its own merits. The shared storage approach has the benefit of never requiring any software mirroring of data between cluster nodes, thus saving precious CPU, I/O and network resources. Sharing is also beneficial because data accessible from another cluster node is always up to date. The mirroring approach, which uses separate storage devices, has the advantage that the cluster nodes do not have to be in the same geographical location and are therefore more useful in a disaster recovery scenario. In fact, if the mirror data was compressed, it could be sent over a WAN connection to a remote node.
RAID systems are available, which allow the disks to be geographically distributed and which require interconnectivity by optical fiber; however, these are rather expensive. Two sets of simple storage devices are less expensive than a dual-port RAID box of similar capacity. The dual-port RAID box can, in some cases, introduce a single point of failure in the cluster. If the RAID file system is somehow corrupted beyond recovery, it would cause serious cluster downtime. Most RAID systems mirror the data at the device level and have no regard for which file system is in use. A software-based system can mirror files in user space, so if a file becomes unreadable on one node it will not necessarily copy the same file system corruption to the other node. Due to this advantage and the cost factor, I decided to use separate storage devices on each node in the cluster. It should be noted that even if a dual-ported storage device is used, both nodes in the cluster should never mount the same partition read/write simultaneously.
Fast/Flexible Linux OS Recovery
On Demand Now
In this live one-hour webinar, learn how to enhance your existing backup strategies for complete disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible full-system recovery solution for UNIX and Linux systems.
Join Linux Journal's Shawn Powers and David Huffman, President/CEO, Storix, Inc.
Free to Linux Journal readers.Register Now!
- Back to Backups
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- A New Version of Rust Hits the Streets
- Google's Abacus Project: It's All about Trust
- Secure Desktops with Qubes: Introduction
- Seeing Red and Getting Sleep
- Fancy Tricks for Changing Numeric Base
- Secure Desktops with Qubes: Installation
- Working with Command Arguments
- Linux Mint 18
Until recently, IBM’s Power Platform was looked upon as being the system that hosted IBM’s flavor of UNIX and proprietary operating system called IBM i. These servers often are found in medium-size businesses running ERP, CRM and financials for on-premise customers. By enabling the Power platform to run the Linux OS, IBM now has positioned Power to be the platform of choice for those already running Linux that are facing scalability issues, especially customers looking at analytics, big data or cloud computing.
￼Running Linux on IBM’s Power hardware offers some obvious benefits, including improved processing speed and memory bandwidth, inherent security, and simpler deployment and management. But if you look beyond the impressive architecture, you’ll also find an open ecosystem that has given rise to a strong, innovative community, as well as an inventory of system and network management applications that really help leverage the benefits offered by running Linux on Power.Get the Guide