Overcoming Asymmetric Routing on Multi-Homed Servers
A good way to address this imbalance is with the iproute2 package. iproute2 allows the administrator to throw away traditional network configuration tools, such as ifconfig, and tackle more complicated situations using a program named ip. ip is part of the iproute2 package written by kernel developer Alexey Kuznetsov. iproute2 comes installed with most distributions. If the ip command is not available on your system, the package may be downloaded at ftp.inr.ac.ru/ip-routing. Like many packages that tightly integrate with Linux internals, iproute2 needs a copy of the kernel sources to be available during its own compilation.
General iproute2 functionality also requires netlink socket support to be compiled into the running kernel. Additionally, the particular strategy outlined in this article requires the IP: advanced router and IP: policy routing options to be configured in the running kernel. These features have been available for the entire 2.4 kernel series and are included in 2.6 as well. The kernel configuration scripts still label policy routing as NEW, but that is more a factor of the kernel help screens being updated slowly than a reflection on the maturity level of advanced and policy routing.
The complete story of iproute2 is too involved for this article. In addition to controlling routing behavior, iproute2 can be used to set up interfaces, control arp behavior, do NAT and establish tunnels.
The main idea of iproute2's routing control is to separate routing decisions into two steps. The second step is a traditional destination-based routing table. The key difference in an iproute2 world is the system may contain many different destination-based routing tables instead of a single global system table. The first iproute2 step is used to identify which of those many tables should be used during the second step. This table identification step is known as rule selection or policy selection. Rule selection is considered more flexible than traditional routing, because it uses factors broader than only the destination address of the packet when making a policy selection.
This two-phase infrastructure lays the groundwork for solving the bottleneck problem on the multi-homed Web server described above. First, we need to create two routing tables; each table routes out through a different interface. Second, we need to create the decision step in such a manner that it selects the routing table that sends the server's response traffic out the same interface on which the request arrived. The source address of the outgoing packets can be used to correlate the packets with the interface on which the session originated. In networking parlance, this technique is known as source-based routing.
Begin by creating the two routing tables. The tables need only default routes to our main gateway, but each one uses a different interface to reach that gateway. Different tables are represented in iproute2 configurations by unique integers. Table numbers can be given string aliases through the /etc/iproute2/rt_tables file, but simple numbers are sufficient for this example. The numbers are simply identifiers, their magnitude carries no meaning. The default system routing table (the normal table seen when using the traditional route command) is number 254. Numbers 1 through 252 are available for local use. We call our example tables here table 1 and table 2:
#ip route add default via 192.168.16.1 dev eth0 tab 1 #ip route add default via 192.168.16.1 dev eth1 tab 2
Displaying the contents of any table is done using the ip route show command:
#ip route show table 1 default via 192.168.16.1 dev eth0 #ip route show table 2 default via 192.168.16.1 dev eth1
Our simple tables look fine; their only difference is the interface on which they transmit. Let's move on to creating the policies that dynamically select among the two tables at runtime. On the example server, interface eth0 is bound to address 192.168.16.20 and interface eth1 is bound to 192.168.16.21. A selection policy that matches the source address of an outgoing packet with the table that uses an interface that is in turn bound to that source address accomplishes our goal. That sounds more complicated than the process really is. What it really means is we need a policy that says packets with a source address of 192.168.16.20 should use table 1 because table 1 uses eth0 and eth0 is bound to 192.168.16.20. Similar logic applies to a policy that ties eth1 and 192.168.16.21 together.
Each routing policy has an associated priority. Policies with a lower priority number take precedence over policies that also may match the candidate packet but have a higher priority value. The priority is an unsigned 32-bit number, so there is never a problem finding enough priority levels to express any algorithm in great detail. Our example algorithm requires only two policies.
At start-up time, the kernel creates several default rules to control the normal routing for the server. These rules have priorities 0, 32766 and 32767. The rule at priority 0 is a special rule for controlling intra-box traffic and does not affect us. However, we do want our new rules to take precedence over the other default rules, so they should use priority numbers less than 32766. These two default rules also may be deleted if you are sure your replacement routes never need to fall back on the default behavior of the server.
The new policy rules are added using the ip rule add command. The from attribute is used to generate source address-based routing policies.
#ip rule add from 192.168.16.20/32 tab 1 priority 500 #ip rule add from 192.168.16.21/32 tab 2 priority 600
Under this setup, outgoing packets first are checked for source address 192.168.16.20. If that matches they use routing table 1, which sends all traffic out eth0. Otherwise the packets are checked for source addresses that match 192.168.16.21. Matches to that rule would use table 2, which sends all traffic out eth1. Any other packets would use the default system rules detailed by rules 32766 and 32767.
#ip rule show 0: from all lookup local 500: from 192.168.16.20 lookup 1 600: from 192.168.16.21 lookup 2 32766: from all lookup main 32767: from all lookup 253
Changes made to the policy database do not take effect dynamically. To tell the kernel that it needs to re-parse the policy database, issue the ip route flush cache command:
#ip route flush cache
iproute2 allows you to use factors other than the source address when performing policy selection. The candidate packet's type of service bits, the destination address and any diffserv markings also are available, along with some other attributes. See www.compendium.com.ar/policy-routing.txt and www.linuxgrill.com/iproute2.doc.html for a good description of all the iproute2 parameters and capabilities.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- RSS Feeds
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Designing Electronics with Linux
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- A Topic for Discussion - Open Source Feature-Richness?
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Validate an E-Mail Address with PHP, the Right Way
- What's the tweeting protocol?
- Kernel Problem
9 hours 37 min ago - BASH script to log IPs on public web server
14 hours 4 min ago - DynDNS
17 hours 40 min ago - Reply to comment | Linux Journal
18 hours 13 min ago - All the articles you talked
20 hours 36 min ago - All the articles you talked
20 hours 39 min ago - All the articles you talked
20 hours 41 min ago - myip
1 day 1 hour ago - Keeping track of IP address
1 day 2 hours ago - Roll your own dynamic dns
1 day 8 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?



Comments
Local Traffic
I just ran into this today when fixing some routes. If you want those two interfaces to send traffic normally on their local network ( 192.168.16.0/24 ) without going through the gateway and forming an asymmetric route with hosts on that network you'll need to add:
#ip route add 192.168.16.0/24 dev eth0 tab 1
#ip route add 192.168.16.0/24 dev eth1 tab 2
to use link routing on the local subnet.
Ubuntu ip route commands - what file do I put them in?
So, I tried /etc/network/if-up.d/ip and /etc/rc.local, but all routing breaks when the box reboots. Where should I put these? Currently, I let the box boot up, then run the commands manually and everything works great. Any suggestions?
1. vi
1.
vi /etc/init.d/iproutes-asym and add the commands you need in there
chmod 755 /etc/init.d/iproutes-asym
2.
cd /etc/rc3.d
ln -s ../init.d/iproutes S99z-iproutes-asym
this is what my iproutes-asym file looks like
ip route add default via 10.53.1.252 dev eth0 tab 1
ip route add default via 10.53.1.252 dev eth1 tab 2
ip rule add from 10.53.1.55/32 tab 1 priority 500
ip rule add from 10.53.1.54/32 tab 2 priority 600
ip route flush cache
Muchas gracias
Thanks for putting this together. Proper routing on a multi-homed server is poorly documented by my Linux distro vendor. Your article was a great help in understanding iproute2 (in this context) and getting things working properly.
solutions
Network interface level problem can be solved with bonding too and it's easier to manage. iproute2 can be used to have multiple loadbalancers and/or gateways though.
Need some HELP for linux asymmetric routing
Hello Friends! I have two ISP-Links from the same Service-Provider. I got for each link an IP-Address on Subnet /30. eth0 runs on x.x.24.66, and eth1 on x.x.24.234.
The default-route is set to x.x.24.233 dev eth1. Now, when a ICMP-Ping reached by x.x.24.234 on eth1, ping will be responded. When a ping reached by x.x.24.66 on eth0, nothing happens.
The ICMP-Ping-Request pass the eth0-interface, but will not be responded via eth1 (default-route)... When i listen on eth1 with tcpdump, there no outgoing-packets to handle ICMP-Responses.
Whats the problem?
Thanks, Mike.
http://www.michaelrack.de
Thank you! Also..
Patrick,
Thank you! I have been struggling with this for weeks. I wish I had found this article first. This is the first time I have found a good explanation of rules and tables and their relationship in the same place.
Regarding SNAT. I listed two source addresses in my iptables firewall.. it mostly works well. However, some outbound connections fail - most noteably SSH, Yahoo IM, IRC all reset after a short time (though web traffic seems ok). I can SNAT to one of my outbound addresses and use an ip rule to designate a single gateway. This works, but I am no longer NAT load balancing over my two WAN links. Anyone know a solution?
-Nathan
Thank you
Thank you very much for this excellent article
Best wishes
Super
Very nice and educative article. Good reading.
Re: Overcoming Asymmetric Routing on Multi-Homed Servers
Minimalist load balancer. From lartc.org section 4.2.2
# ip route default nexthop via gw_1 nexthop via gw_2
Mohammad Bahathir Hashim
Malaysia.
rules vs. nat
What about the SNAT target in iptables? It modifies the source IP address of the packet, but applies only in the POSTROUTING chain. Are the rules (the policy) evaluated *after* that again? The name POSTROUTING makes me think the routing part is already over...
Re: rules vs. nat
If it's anything like a Cisco router, outbound NAT happens after policy routing, and doesn't get another chance at the policy engine.
Sean
Re: rules vs. nat
The SNAT target allows you to specify multiple source ip's and they will be used one after the other. That would probably give you simple outbound load-balancing.
From the iptables man page:
You can add several --to-source options. If you specify more than one source address, either via an address range or multiple --to-source options, a simple round-robin (one after another in cycle) takes place between these addresses.
L2 vs L3...
Great article on policy routing, but isn't this problem what bonding was designed to solve?
http://linux-ip.net/html/ether-bonding.html
/usr/src/linux-2.4/Documentation/networking/bonding.txt
Sean
Re: L2 vs L3...
From the link you posted:
" Bonding for link aggregation must be supported by both endpoints."
"Bonding for link aggregation
"Bonding for link aggregation must be supported by both endpoints."
Sounds like something our marriage therapist once told my (now ex-) wife and I... ;) Needless to say, it was NOT supported by *both* endpoints!