Hack and / - Bond, Ethernet Bond
As a sysadmin, one of the most important virtues you should foster is tolerance. Now, although it's important for vi and Emacs users and GNOME and KDE users to live in harmony, what I want to focus on here is fault tolerance. Everybody has faults, but when your server's uptime is on the line, you should do everything you can to make sure it can survive power faults, network faults, hardware faults and even your faults. In this column, I describe a basic fault-tolerance procedure that's so simple to implement, all sysadmins should add it to their servers: Ethernet bonding.
The basic idea behind Ethernet bonding is to combine two or more Ethernet ports on your machine, such that if one Ethernet port loses its connection, another bonded port can take over the traffic with zero or minimal downtime. The fact is that these days, the bulk of the services on a server require a network connection to be useful, so if you set up multiple Ethernet ports that are connected to redundant switches, you can conceivably survive a NIC failure, a cable failure, a bad switch port or even a switch failure, and your server will stay up.
On top of basic fault tolerance, you also can use certain Ethernet bonding modes to provide load balancing as well and get more bandwidth than a single interface could provide. Ethernet bonding is a feature that is part of the Linux kernel, and it can provide a number of different behaviors based on which bonding mode you choose. All of the Ethernet bonding information can be found in the Documentation/networking/bonding.txt file included with the Linux kernel source. Below, I provide an excerpt from that documentation that lists the 007 bonding modes:
balance-rr or 0 — round-robin policy: transmit packets in sequential order from the first available slave through the last. This mode provides load balancing and fault tolerance.
active-backup or 1 — active-backup policy: only one slave in the bond is active. A different slave becomes active if, and only if, the active slave fails. The bond's MAC address is externally visible on only one port (network adapter) to avoid confusing the switch.
balance-xor or 2 — XOR policy: transmit based on the selected transmit hash policy. The default policy is a simple [(source MAC address XOR'd with destination MAC address) modulo slave count]. Alternate transmit policies may be selected via the xmit_hash_policy option, described below. This mode provides load balancing and fault tolerance.
broadcast or 3 — broadcast policy: transmits everything on all slave interfaces. This mode provides fault tolerance.
802.3ad or 4 — IEEE 802.3ad dynamic link aggregation: creates aggregation groups that share the same speed and duplex settings. Utilizes all slaves in the active aggregator according to the 802.3ad specification.
balance-tlb or 5 — adaptive transmit load balancing: channel bonding that does not require any special switch support. The outgoing traffic is distributed according to the current load (computed relative to the speed) on each slave. Incoming traffic is received by the current slave. If the receiving slave fails, another slave takes over the MAC address of the failed receiving slave.
balance-alb or 6 — adaptive load balancing: includes balance-tlb plus receive load balancing (rlb) for IPv4 traffic and does not require any special switch support. The receive load balancing is achieved by ARP negotiation. The bonding driver intercepts the ARP Replies sent by the local system on their way out and overwrites the source hardware address with the unique hardware address of one of the slaves in the bond such that different peers use different hardware addresses for the server.
Now that you've seen all the choices, the real question is which bonding mode should you choose? To be honest, that's a difficult question to answer, because it depends so much on your network and what you want to accomplish. What I recommend is to set up a test network and simulate a failure by unplugging a cable while you ping the server from another host. What I've found is that different modes handle failure differently, especially in the case of a switch that takes some time to re-enable a port that has been unplugged or a switch that has been rebooted. Depending on the bonding mode you choose, those situations might result in no downtime or a 30-second outage. For my examples in this column, I chose bonding mode 1 because although it provides only fault tolerance, it also has only one port enabled at a time, so it should be relatively safe to experiment with on most switches.
Note: the bonding mode is set at the time the bonding module is loaded, so if you want to experiment with different bonding modes, you will at least have to unload and reload the module or at most reboot the server.
Although Ethernet bonding is accomplished through a kernel module and a utility called ifenslave, the method you use to configure kernel module settings and the way networks are configured can differ between Linux distributions. For this column, I talk about how to set this up for both Red Hat- and Debian-based distributions, as that should cover the broadest range of servers. The first step is to make sure that the ifenslave program is installed. Red Hat servers should have this program installed already, but on Debian-based systems, you might need to install the ifenslave package.
Kyle Rankin is a systems architect; and the author of DevOps Troubleshooting, The Official Ubuntu Server Book, Knoppix Hacks, Knoppix Pocket Reference, Linux Multimedia Hacks, and Ubuntu Hacks.
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?
|Designing Electronics with Linux||May 22, 2013|
|Dynamic DNS—an Object Lesson in Problem Solving||May 21, 2013|
|Using Salt Stack and Vagrant for Drupal Development||May 20, 2013|
|Making Linux and Android Get Along (It's Not as Hard as It Sounds)||May 16, 2013|
|Drupal Is a Framework: Why Everyone Needs to Understand This||May 15, 2013|
|Home, My Backup Data Center||May 13, 2013|
- Designing Electronics with Linux
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- Validate an E-Mail Address with PHP, the Right Way
- Tech Tip: Really Simple HTTP Server with Python
- Why Python?
- Build a Skype Server for Your Home Phone System
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Reply to comment | Linux Journal
52 min 35 sec ago
- Reply to comment | Linux Journal
1 hour 42 min ago
- Not free anymore
5 hours 44 min ago
9 hours 31 min ago
- Reply to comment | Linux Journal
9 hours 39 min ago
- Understanding the Linux Kernel
11 hours 54 min ago
14 hours 24 min ago
- Kernel Problem
1 day 27 min ago
- BASH script to log IPs on public web server
1 day 4 hours ago
1 day 8 hours ago