A High-Availability Cluster for Linux
A major design factor is resynchronization (mirroring back) of the files once a failed node has recovered. A reliable procedure must be employed so that data which has changed on the failover node during the failure period is mirrored back to the original node and not lost, because the original node overwrites or deletes it in the restoration procedure. The resynchronization procedure should be implemented so that a node cannot perform any mirroring while another node has taken over its services. Also, before the services can be restarted on the original node, all files associated with it must be completely mirrored back to this original node. This must be done while the services are off-line on both nodes to prevent the services from writing to the files being restored. Failure to prevent this could result in data corruption and loss.
The main problem when using this solution was with IMAP4 and pop3 mail spools. If an e-mail message is received and delivered on serv2, and serv2 fails before mirroring can take place, serv1 will take over the mail services. Subsequent mail messages would arrive in serv1's mail spool. When serv2 recovers, any e-mail received just before failure will be overwritten by the new mail received on serv1. The best way to solve this is to configure Sendmail to queue a copy of its mail for delivery to the takeover node. In the event that the takeover node is off-line, mail would remain in the Sendmail queue. Once the failed node recovered, e-mail messages would be successfully delivered. This method requires no mirroring of the mail spools and queues. However, it would be necessary to have two Sendmail configurations available on both nodes: one configuration for normal operation and the other for node takeover operation. This will prevent mail from bouncing between the two servers.
I am not a Sendmail expert. If you know how to configure dual-queuing Sendmail delivery, please let me know. This part is still a work in progress. As a temporary measure, I create backup files on resynchronization of the mail spool with manual checking on node recovery, which is quite time consuming. I also prevent such difficulties by mirroring the mail spool as frequently as possible. This has an unfortunate temporary side effect of making my hard disks work overtime. Similar problems would be encountered when clustering a database service. However, a few large UNIX database vendors are now providing parallel versions of their products, which enable concurrent operation across several nodes in a cluster.
A node could fail for various reasons ranging from an operating system crash, which would result in a hang or reboot, to a hardware failure, which could result in the node going into standby mode. If the system is in standby mode, it will not automatically recover. The administrator must manually remove a standby lock file and start run-level 5 on the failed node to confirm to the rest of the cluster that the problem has been resolved. If the OS hangs, this would have the same effect as a standby run level; however, if the reset button is pressed or the system reboots, the node will try to rejoin the cluster, as no standby lock file will exist. When a node attempts to rejoin the cluster, the other node will detect the recovery and stop all cluster services while the resynchronization of the disks takes place. Once this has completed, the cluster services will be restarted and the cluster will once again be in full operation.
My choice of Linux distribution is Red Hat 5.1 on the Intel platform. There are, however, no reasons why this could not be adapted for another Linux distribution. The implementation is purely in user space. No special drivers are required. Some basic prerequisites are necessary in order to effectively deploy this system:
Two similarly equipped servers, especially in terms of data storage space, are needed.
Three network interface cards per server are recommended, although two might work at the expense of some modifications and extra LAN traffic.
Sufficient network bandwidth is needed between the cluster nodes.
My system consists of two Dell PowerEdge 2300 Servers, each complete with:
three 3C905B 100BaseTX Ethernet cards
two 9GB Ultra SCSI 2 hard disks
one Pentium II 350 MHz CPU
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- Stunnel Security for Oracle
- SourceClear Open
- SUSE LLC's SUSE Manager
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- My +1 Sword of Productivity
- Managing Linux Using Puppet
- Tech Tip: Really Simple HTTP Server with Python
- Non-Linux FOSS: Caffeine!
- Google's SwiftShader Released
- Doing for User Space What We Did for Kernel Space
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide