A High-Availability Cluster for Linux
Although Linux is known to be an extremely stable operating system, the fact that the standard PC hardware is not quite so reliable must not be overlooked. I have been maintaining Linux servers for a long time, and in most cases when a system has failed, it has been due to server hardware failure. UNIX in the commercial world is known for having good clustering and high-availability (HA) technologies.
In my present company, Electec, we rely heavily upon e-mail (Sendmail and IMAP4), Windows file sharing (Samba), FTP and dial-up authentication (radius) services on a 24-hour basis for communication with our suppliers, staff and customers who are located in different time zones. Until recently, all of these services were consolidated on one Linux server. This system had served us very well. However, it was just a matter of time before a hardware failure occurred, which would cause us loss of productivity and revenue.
High availability is becoming increasingly important as we depend more and more on computers in business. I decided to design and implement an inexpensive high-availability solution for our business-critical needs without requiring the use of expensive additional hardware or software. This article covers the design aspects, pitfalls and implementation experiences of the solution.
Quite a few different approaches and combinations of approaches exist for high availability on servers. One way is to use a single, fault-tolerant server with redundant power supplies, RAID, environmental monitoring, fans, network interface cards and so on. The other way involves the use of several units of non-redundant hardware arranged in a cluster, so that each node (or server) in the cluster is able to take over from any failures of partner nodes. The fault-tolerant server approach has the advantage that operating system, application configurations and operations are the same as if you were using a simple, inexpensive server. With a cluster, the application and OS configurations can become very complex and much advanced planning is needed.
With a fault-tolerant server, failures are taken care of in such a way that clients do not notice any downtime—recovery is seamless. Ideally, this should also be the case with node failures in a cluster. In many cases, the hardware cost of a cluster is far less than that of a single fault-tolerant server, especially when you do not have to spend a great deal of money for one of the commercial cluster software offerings.
A trade-off is present between cost and client disruption/downtime. You must ask yourself how much downtime you and your users can tolerate. Shorter downtimes usually require a much more complex or costly solution. In our case, I decided we could live with approximately five minutes downtime in the event of a failure; therefore, I chose to use a cluster as opposed to a single fault-tolerant server.
Many clustering solutions are available to the UNIX market, which can provide almost zero downtime in the event of a node takeover by means of session and network connection takeover. These solutions are mostly expensive and normally require the use of external shared storage hardware. In our case, we can allow for sessions and connections to be lost. This simplifies the task of implementing a high-availability cluster.
When implementing HA clustering, some recommend that a shared storage device, such as a dual-port RAID box, be used. However, it is possible to approach the problem by using a separate storage device for each node in the cluster and mirroring the storage devices when necessary. Each avenue has its own merits. The shared storage approach has the benefit of never requiring any software mirroring of data between cluster nodes, thus saving precious CPU, I/O and network resources. Sharing is also beneficial because data accessible from another cluster node is always up to date. The mirroring approach, which uses separate storage devices, has the advantage that the cluster nodes do not have to be in the same geographical location and are therefore more useful in a disaster recovery scenario. In fact, if the mirror data was compressed, it could be sent over a WAN connection to a remote node.
RAID systems are available, which allow the disks to be geographically distributed and which require interconnectivity by optical fiber; however, these are rather expensive. Two sets of simple storage devices are less expensive than a dual-port RAID box of similar capacity. The dual-port RAID box can, in some cases, introduce a single point of failure in the cluster. If the RAID file system is somehow corrupted beyond recovery, it would cause serious cluster downtime. Most RAID systems mirror the data at the device level and have no regard for which file system is in use. A software-based system can mirror files in user space, so if a file becomes unreadable on one node it will not necessarily copy the same file system corruption to the other node. Due to this advantage and the cost factor, I decided to use separate storage devices on each node in the cluster. It should be noted that even if a dual-ported storage device is used, both nodes in the cluster should never mount the same partition read/write simultaneously.
|Free Today: September Issue of Linux Journal (Retail value: $5.99)||Sep 27, 2016|
|nginx||Sep 27, 2016|
|Epiq Solutions' Sidekiq M.2||Sep 26, 2016|
|Nativ Disc||Sep 23, 2016|
|Android Browser Security--What You Haven't Been Told||Sep 22, 2016|
|The Many Paths to a Solution||Sep 21, 2016|
- Free Today: September Issue of Linux Journal (Retail value: $5.99)
- Android Browser Security--What You Haven't Been Told
- Readers' Choice Awards 2013
- Epiq Solutions' Sidekiq M.2
- The Many Paths to a Solution
- Nativ Disc
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Synopsys' Coverity
- Tech Tip: Really Simple HTTP Server with Python
Pick up any e-commerce web or mobile app today, and you’ll be holding a mashup of interconnected applications and services from a variety of different providers. For instance, when you connect to Amazon’s e-commerce app, cookies, tags and pixels that are monitored by solutions like Exact Target, BazaarVoice, Bing, Shopzilla, Liveramp and Google Tag Manager track every action you take. You’re presented with special offers and coupons based on your viewing and buying patterns. If you find something you want for your birthday, a third party manages your wish list, which you can share through multiple social- media outlets or email to a friend. When you select something to buy, you find yourself presented with similar items as kind suggestions. And when you finally check out, you’re offered the ability to pay with promo codes, gifts cards, PayPal or a variety of credit cards.Get the Guide