A High-Availability Cluster for Linux
The resynchronization (mirroring back) procedure was implemented using rsync, which uses a lock file to disallow any mirroring to another node when a node failure is sensed. The lock file is checked for existence by sync-app before any files are mirrored. This prevents node A mirroring to node B, while node B is mirroring the same files to node A.
If preferred, clusterd could be used with a shared and/or distributed storage device by removing the resynchronization function and by not using sync-app, although I have not tried this.
To test server failure, I had to simulate the failure of every interface on the cluster. In each case, the cluster took the expected action and shut down the correct server. In the case of the inter-node/heartbeat network failing, the nodes simply carried on normal operation and notified the administrator of the failure. On a point-to-point network of this nature, it is almost impossible to determine which NIC is at fault. I simulated various network switch failures and power supply failures. The results were all as expected. After a node was put into standby (single-user) mode, I had to manually remove a standby lock file in order to fully bring up the node again. If a node recovered and entered a network runlevel while the standby lock file still existed, the remote node immediately put the node back into standby mode to prevent an IP and MAC address clash on the LAN.
Mirroring was tested over a period of several months, and I found that the nodes could typically compare 6GB of unchanged data in approximately 50,000 files in under 45 seconds.
After catastrophic node failure (I pulled the power plug from the UPS), recovery time for the node was around 10 to 15 minutes for fsck disk checking, and a disk resynchronization time of around three minutes (9GB of data). This represented a cluster services downtime of around three minutes to the LAN clients.
Failover delay from when a node failed until the remote node fully took over was typically 60 to 80 seconds. The effect on users depended on the service: Sendmail, IMAP4, http and FTP simply refused connection for users for the duration, whereas Samba sometimes momentarily locked up a Windows PC application when files were open at the point of failure. radius and dhcpd caused no client lock-outs, probably because of their UDP implementation.
On the whole, the cluster provides us with much better system availability. It is a vast improvement over the single server, as we can now afford to do server maintenance and upgrades during working hours. We have not yet had any catastrophic failures with the new Dell servers, but the test results show a minimal downtime of less than two minutes while a node takes over. We have saved large amounts of capital by implementing a simple high-availability cluster without the need for expensive specialist hardware such as dual ported RAID.
This clustering solution is certainly not as advanced as some of the commercial clusters or as thorough as some of the upcoming open source Linux-HA project proposals; however, it does sufficiently meet our needs.
The system has been in full-time production operation since September 1998. We have over 30 LAN clients using the cluster as their primary “server”. The system has proven to be reliable. The company sees the server as a business-critical system, and we have achieved the objectives of high availability.
|Free Today: September Issue of Linux Journal (Retail value: $5.99)||Sep 27, 2016|
|nginx||Sep 27, 2016|
|Epiq Solutions' Sidekiq M.2||Sep 26, 2016|
|Nativ Disc||Sep 23, 2016|
|Android Browser Security--What You Haven't Been Told||Sep 22, 2016|
|The Many Paths to a Solution||Sep 21, 2016|
- Free Today: September Issue of Linux Journal (Retail value: $5.99)
- Android Browser Security--What You Haven't Been Told
- Readers' Choice Awards 2013
- Epiq Solutions' Sidekiq M.2
- Downloading an Entire Web Site with wget
- The Many Paths to a Solution
- Securing the Programmer
- Nativ Disc
- Tech Tip: Really Simple HTTP Server with Python
Pick up any e-commerce web or mobile app today, and you’ll be holding a mashup of interconnected applications and services from a variety of different providers. For instance, when you connect to Amazon’s e-commerce app, cookies, tags and pixels that are monitored by solutions like Exact Target, BazaarVoice, Bing, Shopzilla, Liveramp and Google Tag Manager track every action you take. You’re presented with special offers and coupons based on your viewing and buying patterns. If you find something you want for your birthday, a third party manages your wish list, which you can share through multiple social- media outlets or email to a friend. When you select something to buy, you find yourself presented with similar items as kind suggestions. And when you finally check out, you’re offered the ability to pay with promo codes, gifts cards, PayPal or a variety of credit cards.Get the Guide