Overcoming Asymmetric Routing on Multi-Homed Servers
Asymmetric TCP/IP routing causes a common performance problem on servers that have more than one network connection. The atypical network flows created by asymmetric routes occur most often in server environments where a different interface is used for sending traffic than is used to receive traffic. The flows are considered to be unusual because traffic from one of end of the connection (A→B) travels over a different set of links than does traffic moving in the opposite direction (B→A). Asymmetric routes have legitimate uses, such as taking advantage of high bandwidth but unidirectional satellite links, but more often are a source of performance problems.
These abnormal packet flows interact poorly with TCP's congestion control algorithm. TCP sends packets in both directions even when the data flow, or goodput, is unidirectional. TCP's congestion control algorithm anticipates that the data packets share delay and loss characteristics similar to what their corresponding acknowledgment and control packets carry when traveling in the reverse direction. When the two types of data travel across physically different paths, this assumption is unlikely to be upheld. The resulting mismatch generally results in suboptimal TCP performance (see Resources).
A more serious problem occurs when the asymmetric routing introduces artificial bandwidth bottlenecks. A server with two interfaces of equal capacity can develop a bottleneck if it receives traffic on both interfaces but always responds through only one. Servers commonly add multiple interfaces, even multiple interfaces connected to the same switch, in order to increase the aggregate transmission capacity of the server. Asymmetric routing is a commonly unanticipated outcome of this configuration that comes about because traditional routing is wholly destination-based.
Destination-based routing uses only some leading prefix of the packet's destination IP address when selecting on which interface to send the packet out. Each entry in the routing table contains the IP address of the next-hop router (if a router is necessary) and the interface through which that packet should be sent. The entry also contains a variable length IP address prefix to match candidate packets against. That prefix could be as long as 32 bits for an IPv4 host route or as short as 0 bits for a default route that matches everything. If more than one routing table entry matches, the entry with the longest prefix is used.
A typical server not participating in a dynamic routing protocol, such as OSPF or BGP, has a simple routing table. It contains one entry for each interface on the server and one default route for reaching all the hosts not directly connected to its interfaces. This simple approach, which relies heavily on a single default route, results in a concentration of outgoing traffic through a single interface without regard to the interface through which the request originally was received.
A good illustration of this situation is a Web server equipped with two 100Mb full duplex interfaces. Both of the interfaces are configured on the same subnet. This setup should provide 200Mb/sec of bandwidth from both incoming and outgoing traffic if it is attached to a full duplex switch with a multi-gigabit backplane. This arrangement is an attractive server design because it allows the server to exceed 100Mb of capacity without having to upgrade to gigabit network infrastructure. This is a cost effective approach, as even though copper-based gigabit NICs are becoming inexpensive, the switch port costs to utilize them are still significantly more than what would be incurred for even several 100Mb ports.
Typically, clients connecting to this Web server first would encounter some kind of load balancer, either DNS-based or perhaps a Layer-4 switching appliance, that would direct half of the requests to one interface and half to the other. Listing 1 shows what the default routing table might look like on that Web server if it had two interfaces, both configured on the 192.168.16.0/24 subnet.
Listing 1. Typical Routing Table
Destination Gateway Genmask Flags Iface 192.168.16.0 * 255.255.255.0 U eth0 192.168.16.0 * 255.255.255.0 U eth1 127.0.0.0 * 255.0.0.0 U lo default 192.168.16.1 0.0.0.0 UG eth0
In this circumstance incoming load is distributed evenly, thanks to the load balancer. However, the response traffic all goes out through eth0 because, by default, the server uses destination-based routing.

Figure 1. An Imbalanced Server

Figure 2. To use both interfaces effectively, we need to use policy-based routing.
Most of the traffic volume on a Web server is outgoing because HTTP responses tend to be much larger than are requests. Therefore, the effective bandwidth of this server still is limited to 100Mb/sec, even though it has two load-balanced interfaces. Load balancing the requests alone does not help, because the bottleneck is on the response side. Packets either use the default rule through eth0 or, if they are destined for the local subnet they have to choose between two equally weighted routes. In that case the first route (again to eth0) is selected. The end result is the Web requests are balanced evenly across eth0 and eth1, but the larger and more important responses all are funneled through a bottleneck on eth0.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- New Products
- Linux Systems Administrator
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- Reply to comment | Linux Journal
20 min 11 sec ago - Nice article, thanks for the
11 hours 39 sec ago - I once had a better way I
16 hours 46 min ago - Not only you I too assumed
17 hours 4 min ago - another very interesting
18 hours 57 min ago - Reply to comment | Linux Journal
20 hours 50 min ago - Reply to comment | Linux Journal
1 day 3 hours ago - Reply to comment | Linux Journal
1 day 4 hours ago - Favorite (and easily brute-forced) pw's
1 day 5 hours ago - Have you tried Boxen? It's a
1 day 11 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?



Comments
Local Traffic
I just ran into this today when fixing some routes. If you want those two interfaces to send traffic normally on their local network ( 192.168.16.0/24 ) without going through the gateway and forming an asymmetric route with hosts on that network you'll need to add:
#ip route add 192.168.16.0/24 dev eth0 tab 1
#ip route add 192.168.16.0/24 dev eth1 tab 2
to use link routing on the local subnet.
Ubuntu ip route commands - what file do I put them in?
So, I tried /etc/network/if-up.d/ip and /etc/rc.local, but all routing breaks when the box reboots. Where should I put these? Currently, I let the box boot up, then run the commands manually and everything works great. Any suggestions?
1. vi
1.
vi /etc/init.d/iproutes-asym and add the commands you need in there
chmod 755 /etc/init.d/iproutes-asym
2.
cd /etc/rc3.d
ln -s ../init.d/iproutes S99z-iproutes-asym
this is what my iproutes-asym file looks like
ip route add default via 10.53.1.252 dev eth0 tab 1
ip route add default via 10.53.1.252 dev eth1 tab 2
ip rule add from 10.53.1.55/32 tab 1 priority 500
ip rule add from 10.53.1.54/32 tab 2 priority 600
ip route flush cache
Muchas gracias
Thanks for putting this together. Proper routing on a multi-homed server is poorly documented by my Linux distro vendor. Your article was a great help in understanding iproute2 (in this context) and getting things working properly.
solutions
Network interface level problem can be solved with bonding too and it's easier to manage. iproute2 can be used to have multiple loadbalancers and/or gateways though.
Need some HELP for linux asymmetric routing
Hello Friends! I have two ISP-Links from the same Service-Provider. I got for each link an IP-Address on Subnet /30. eth0 runs on x.x.24.66, and eth1 on x.x.24.234.
The default-route is set to x.x.24.233 dev eth1. Now, when a ICMP-Ping reached by x.x.24.234 on eth1, ping will be responded. When a ping reached by x.x.24.66 on eth0, nothing happens.
The ICMP-Ping-Request pass the eth0-interface, but will not be responded via eth1 (default-route)... When i listen on eth1 with tcpdump, there no outgoing-packets to handle ICMP-Responses.
Whats the problem?
Thanks, Mike.
http://www.michaelrack.de
Thank you! Also..
Patrick,
Thank you! I have been struggling with this for weeks. I wish I had found this article first. This is the first time I have found a good explanation of rules and tables and their relationship in the same place.
Regarding SNAT. I listed two source addresses in my iptables firewall.. it mostly works well. However, some outbound connections fail - most noteably SSH, Yahoo IM, IRC all reset after a short time (though web traffic seems ok). I can SNAT to one of my outbound addresses and use an ip rule to designate a single gateway. This works, but I am no longer NAT load balancing over my two WAN links. Anyone know a solution?
-Nathan
Thank you
Thank you very much for this excellent article
Best wishes
Super
Very nice and educative article. Good reading.
Re: Overcoming Asymmetric Routing on Multi-Homed Servers
Minimalist load balancer. From lartc.org section 4.2.2
# ip route default nexthop via gw_1 nexthop via gw_2
Mohammad Bahathir Hashim
Malaysia.
rules vs. nat
What about the SNAT target in iptables? It modifies the source IP address of the packet, but applies only in the POSTROUTING chain. Are the rules (the policy) evaluated *after* that again? The name POSTROUTING makes me think the routing part is already over...
Re: rules vs. nat
If it's anything like a Cisco router, outbound NAT happens after policy routing, and doesn't get another chance at the policy engine.
Sean
Re: rules vs. nat
The SNAT target allows you to specify multiple source ip's and they will be used one after the other. That would probably give you simple outbound load-balancing.
From the iptables man page:
You can add several --to-source options. If you specify more than one source address, either via an address range or multiple --to-source options, a simple round-robin (one after another in cycle) takes place between these addresses.
L2 vs L3...
Great article on policy routing, but isn't this problem what bonding was designed to solve?
http://linux-ip.net/html/ether-bonding.html
/usr/src/linux-2.4/Documentation/networking/bonding.txt
Sean
Re: L2 vs L3...
From the link you posted:
" Bonding for link aggregation must be supported by both endpoints."
"Bonding for link aggregation
"Bonding for link aggregation must be supported by both endpoints."
Sounds like something our marriage therapist once told my (now ex-) wife and I... ;) Needless to say, it was NOT supported by *both* endpoints!