A High-Availability Cluster for Linux
If a node fails in some way, it is vital that only one of the nodes performs the IP and MAC address takeover. Determining which node has failed in a cluster is easier said than done. If the heartbeat network failed while using a simplistic takeover algorithm, both of the nodes would wrongly perform MAC, IP and application takeover and the cluster would become partitioned. This would cause major problems on any LAN and would probably result in some kind of network and server deadlock. One way to prevent this scenario from taking place is to make the node which first detects a remote node failure, remote login to each of that remote node's interfaces and put it into a standby run level (e.g., single-user mode). This run level would prevent the failed node from attempting to restart itself and thus stop an endless failure-recovery loop. There are problems with this method. What if node A (which has a failed NIC) thinks node B is not responding, then remotely puts node B into single-user mode? You would end up with no servers available to the LAN. There must be a mechanism to decide which node has actually failed. One of the few ways to do this on a two-node cluster is to rely on a third party. My method of implementing this is to use a list of locally accessible devices which can be pinged on the LAN. By a process of arbitration, the node which detects the highest number of unreachable devices will gracefully surrender and go into the standby runlevel. This is shown in Figure 2.
To implement this solution with minimal risk of data loss, the data on the two servers must be constantly mirrored. It would be ideal if the data written to serv1 was simultaneously written to serv2 and vice versa. In practice, a near-perfect mirror would require a substantial kernel implementation with many hurdles along the way, such as file system performance and distributed lock management. One method would be to implement a RAID mirror which used disks from different nodes: a cluster file system. This is supposed to be possible in later incarnations of the 2.1 and probably the 2.2 kernel by using md, NFS and network block devices. Another solution, which also remains to be evaluated, is the use of the CODA distributed file system.
A practical way to have a mirror of data on each node is to allow the frequency of the file mirroring to be predefined by the administrator, not only for nodes but rather on a per file or directory basis. With this fine-grained level of control, the data volatility characteristics of a particular file, directory or application can be reflected in frequency of mirroring to the other node in the cluster. For example, fast-changing data such as an IMAP4 e-mail spool, where users are constantly moving, reading and deleting e-mail, could be mirrored every minute, whereas slow-changing data such as the company's mostly static web pages could be mirrored hourly.
Trade-offs must be considered when mirroring data in this way. One major trade-off is mirror integrity with CPU and I/O resource consumption. It would be nice if I could have my IMAP4 mail spools mirrored each second. In practice, this would not work because the server takes 15 seconds to synchronize this spool each time. The CPU and disk I/O usage could be so high that the services would be noticeably slowed down. This would seem to defeat the objective of high availability. Even if the CPU had the resources to read the disks in less than one second, there might still be problems transferring the data changes between the nodes due to a network throughput bottleneck.
This mirroring approach does have flaws. If a file is saved to a Samba file share on serv1, and serv1 fails before they are mirrored, the file will remain unavailable until serv1 fully recovers. In a worst-case scenario, the serv1 file system will have been corrupted and the file lost forever. However, compared to a single server with a backup tape, this scenario is less risky because traditional backups are made far less frequently than the mirroring in the cluster. Of course, a cluster is no replacement for traditional backups which are still vital for many other reasons.
|Designing Electronics with Linux||May 22, 2013|
|Dynamic DNS—an Object Lesson in Problem Solving||May 21, 2013|
|Using Salt Stack and Vagrant for Drupal Development||May 20, 2013|
|Making Linux and Android Get Along (It's Not as Hard as It Sounds)||May 16, 2013|
|Drupal Is a Framework: Why Everyone Needs to Understand This||May 15, 2013|
|Home, My Backup Data Center||May 13, 2013|
- Designing Electronics with Linux
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- Validate an E-Mail Address with PHP, the Right Way
- Tech Tip: Really Simple HTTP Server with Python
- Why Python?
- Build a Skype Server for Your Home Phone System
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Reply to comment | Linux Journal
27 min 1 sec ago
- Reply to comment | Linux Journal
1 hour 17 min ago
- Not free anymore
5 hours 19 min ago
9 hours 6 min ago
- Reply to comment | Linux Journal
9 hours 14 min ago
- Understanding the Linux Kernel
11 hours 29 min ago
13 hours 58 min ago
- Kernel Problem
1 day 1 min ago
- BASH script to log IPs on public web server
1 day 4 hours ago
1 day 8 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi
It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?