Economical Fault-Tolerant Networks
The newly elected master server quietly takes over the virtual server address. However, the clients already have an address resolution protocol (ARP) cache entry connecting the virtual server IP address to the machine address (MAC) of the failed master. This cache would inhibit a client from communicating with the newer master, because the client would still try to communicate with the old MAC address. One solution which overcomes this problem is having an arbitrary MAC address be selected and taken over by elected masters. The problem with this approach is that not all network adapters support this function. Another solution would be to bluntly dump the ARP cache of all the clients and then reset the cache, which is also not an efficient technique.
The method we devised is to delete the virtual server IP address entry in the ARP cache of the newly elected master. Now the master automatically tries to update its ARP cache. In this process, it contacts the machines on the network, clients as well as slave servers. This not only updates the master's ARP cache, but also that of the clients. The advantage of this technique is that our software does not have to send special update packets to each computer—the already-working ARP mechanism does that for us.
Updating the ARP cache, followed by the IP address takeover, transparently causes the client to request services from the newly elected master. The clients may experience some delay while the actual election takes place, but other than that, they continue uninterrupted.
A very critical perspective of the whole switchover scenario is that the machines should be maintained identically. Only then will the switchover become truly transparent. Steps must be taken to ensure that in the event of a failure, the likely new masters would have as much updated data as possible. Two important aspects of maintaining the peers are time and file synchronization.
Important and critical files need to be circulated on all the servers. Any server could have been a master, and might have newer versions of files. Thus, it becomes imperative that the servers be time synchronized, so that their file timestamps are comparable. This ensures that only the updated versions are distributed at the time of file synchronization. An important thing to note is that the time does not have to be matched with the real time. The only requirement is that all the servers have the same time. We relied on simply setting the clock to the time of the master server, using a remote shell procedure. No special time servers were used, although running NTPD or TIMED would have been a better technique.
Another important task in maintaining peers is performing synchronization and replication of data over the entire array of servers in order to keep them consistently identical. The replication process is time consuming and often congests the network. The frequency of replication should be high enough to accommodate replacement transparency and minimize data losses during switchovers, while simultaneously low enough to allow proper network operation without undue congestion.
In a very dynamic scenario, it may not be possible to continuously distribute the updates on all machines taking part in an election. In such a situation, a switchover may cause a retrograde to the last synchronized version of files. Typically, the synchronization is scheduled during low workload hours. Additionally, instead of making backups, data is now distributed to the server array which better serves the purpose.
Having discussed various necessary aspects of the software solution, we move on to its description. We implemented this solution using Perl 5.0 running on Red Hat Linux 6.0. Owing to the portability of Perl, the software runs on any version of Linux/UNIX with minor or no changes. The program is implemented as a daemon that is initiated at startup of the servers. It moves to the background after spawning four processes:
Heartbeat listener process for processing heartbeat signals generated by the master server.
Listener process for receiving and parsing various signals generated from other servers.
Doctor process to interpret the heartbeat signal and decide whether the master server has failed.
Elector process to actually implement the election algorithm and decide which actions need to be taken. It also generates a heartbeat signal if running on a master server.
The main daemon is supplemented with special scripts to handle startup and synchronization. Separate scripts are created for master and slave servers. This makes the configuration of startup services on both master and slave servers very easy.
Besides subordinate scripts, a host of scripts is provided for file synchronization and distribution. They may be initiated for scheduled synchronization and backup.
We employed UDP communication for heartbeat signals in order to minimize network load. For election calls and other signals, TCP is used to ensure reliability.
Fast/Flexible Linux OS Recovery
On Demand Now
In this live one-hour webinar, learn how to enhance your existing backup strategies for complete disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible full-system recovery solution for UNIX and Linux systems.
Join Linux Journal's Shawn Powers and David Huffman, President/CEO, Storix, Inc.
Free to Linux Journal readers.Register Now!
- Devuan Beta Release
- May 2016 Issue of Linux Journal
- EnterpriseDB's EDB Postgres Advanced Server and EDB Postgres Enterprise Manager
- The US Government and Open-Source Software
- The Humble Hacker?
- BitTorrent Inc.'s Sync
- The Death of RoboVM
- Open-Source Project Secretly Funded by CIA
- New Container Image Standard Promises More Portable Apps
- AdaCore's SPARK Pro
In modern computer systems, privacy and security are mandatory. However, connections from the outside over public networks automatically imply risks. One easily available solution to avoid eavesdroppers’ attempts is SSH. But, its wide adoption during the past 21 years has made it a target for attackers, so hardening your system properly is a must.
Additionally, in highly regulated markets, you must comply with specific operational requirements, proving that you conform to standards and even that you have included new mandatory authentication methods, such as two-factor authentication. In this ebook, I discuss SSH and how to configure and manage it to guarantee that your network is safe, your data is secure and that you comply with relevant regulations.Get the Guide