Economical Fault-Tolerant Networks
The newly elected master server quietly takes over the virtual server address. However, the clients already have an address resolution protocol (ARP) cache entry connecting the virtual server IP address to the machine address (MAC) of the failed master. This cache would inhibit a client from communicating with the newer master, because the client would still try to communicate with the old MAC address. One solution which overcomes this problem is having an arbitrary MAC address be selected and taken over by elected masters. The problem with this approach is that not all network adapters support this function. Another solution would be to bluntly dump the ARP cache of all the clients and then reset the cache, which is also not an efficient technique.
The method we devised is to delete the virtual server IP address entry in the ARP cache of the newly elected master. Now the master automatically tries to update its ARP cache. In this process, it contacts the machines on the network, clients as well as slave servers. This not only updates the master's ARP cache, but also that of the clients. The advantage of this technique is that our software does not have to send special update packets to each computer—the already-working ARP mechanism does that for us.
Updating the ARP cache, followed by the IP address takeover, transparently causes the client to request services from the newly elected master. The clients may experience some delay while the actual election takes place, but other than that, they continue uninterrupted.
A very critical perspective of the whole switchover scenario is that the machines should be maintained identically. Only then will the switchover become truly transparent. Steps must be taken to ensure that in the event of a failure, the likely new masters would have as much updated data as possible. Two important aspects of maintaining the peers are time and file synchronization.
Important and critical files need to be circulated on all the servers. Any server could have been a master, and might have newer versions of files. Thus, it becomes imperative that the servers be time synchronized, so that their file timestamps are comparable. This ensures that only the updated versions are distributed at the time of file synchronization. An important thing to note is that the time does not have to be matched with the real time. The only requirement is that all the servers have the same time. We relied on simply setting the clock to the time of the master server, using a remote shell procedure. No special time servers were used, although running NTPD or TIMED would have been a better technique.
Another important task in maintaining peers is performing synchronization and replication of data over the entire array of servers in order to keep them consistently identical. The replication process is time consuming and often congests the network. The frequency of replication should be high enough to accommodate replacement transparency and minimize data losses during switchovers, while simultaneously low enough to allow proper network operation without undue congestion.
In a very dynamic scenario, it may not be possible to continuously distribute the updates on all machines taking part in an election. In such a situation, a switchover may cause a retrograde to the last synchronized version of files. Typically, the synchronization is scheduled during low workload hours. Additionally, instead of making backups, data is now distributed to the server array which better serves the purpose.
Having discussed various necessary aspects of the software solution, we move on to its description. We implemented this solution using Perl 5.0 running on Red Hat Linux 6.0. Owing to the portability of Perl, the software runs on any version of Linux/UNIX with minor or no changes. The program is implemented as a daemon that is initiated at startup of the servers. It moves to the background after spawning four processes:
Heartbeat listener process for processing heartbeat signals generated by the master server.
Listener process for receiving and parsing various signals generated from other servers.
Doctor process to interpret the heartbeat signal and decide whether the master server has failed.
Elector process to actually implement the election algorithm and decide which actions need to be taken. It also generates a heartbeat signal if running on a master server.
The main daemon is supplemented with special scripts to handle startup and synchronization. Separate scripts are created for master and slave servers. This makes the configuration of startup services on both master and slave servers very easy.
Besides subordinate scripts, a host of scripts is provided for file synchronization and distribution. They may be initiated for scheduled synchronization and backup.
We employed UDP communication for heartbeat signals in order to minimize network load. For election calls and other signals, TCP is used to ensure reliability.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- SUSE LLC's SUSE Manager
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- Managing Linux Using Puppet
- My +1 Sword of Productivity
- Non-Linux FOSS: Caffeine!
- Doing for User Space What We Did for Kernel Space
- SuperTuxKart 0.9.2 Released
- Google's SwiftShader Released
- Parsing an RSS News Feed with a Bash Script
- Rogue Wave Software's Zend Server
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide