OK, so here's a fun one.
I've been trying to troubleshoot a very odd traceroute/udp issue I'm seeing that I can't figure out.
I have two gateway/firewall boxes running NAT for internal hosts on two different networks (office1 and office2).
one is running 2.6.22-gentoo-r9, lets call this one gw1
the other 2.6.20-gentoo-r8, lets call this one gw2
Theres two other systems here:
a system inside either of these two lans, we'll call it lan_host1
a system out on the internet not behind nat, we'll call it outside1
From lan_host1 which is behind either gw1 or gw2 I can traceroute out fine. I can hit gw1 or gw2 from inside the opposites network, i can hit any other host on the internet ( including outside1 ).
From outside1 I can traceroute to both gw1 and gw2 no problem.
Whenever I try to traceroute from either gw1 or gw2 to any host on the internet, including each other and/or outside1, it eventually times out and fails. Running tcpdump on both hosts at the same time while doing the traceroute shows the same UDP packets at both ends, and I know that both hosts respond to traceroute otherwise. What is interesting is what tcpdump shows ( on both hosts ) of the packets in question:
$ tcpdump -nnvvS
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 68 bytes
14:31:29.322429 IP (tos 0x0, ttl 1, id 53536, offset 0, flags [none], proto: UDP (17), length: 46) gw1.53493 > gw2.33477: [bad udp cksum a015!] UDP, length 18
Both sides will see this exact same packet, so I know no other modifications to the packet are happening. Both sides show the bad udp cksum error. In IPTABLES I have the following rules:
$IPTABLES -A INPUT -m state --state INVALID -j DROP
$IPTABLES -A FORWARD -m state --state INVALID -j DROP
$IPTABLES -A OUTPUT -m state --state INVALID -j REJECT
If I do the following I can watch live the iptables counters:
$ watch 'iptables -nv -L | grep INVALID' (shows the following)
18 780 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 state INVALID
0 0 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 state INVALID
0 0 REJECT all -- * * 0.0.0.0/0 0.0.0.0/0 state INVALID reject-with icmp-port-unreachable
Now with every attempt at the traceroute to the destined host the INVALID counters on the INPUT table increase. So I try removing that line for INPUT and not DROP'ing any of my INVALID data, but still no reply goes back to the original host. I am guessing that the kernel still drops the packet because the UDP check sum is incorrect. tcpdump will show zero reply to these traceroute attempts. I have even tried to traceroute to other networking gear including my ISPs upstream Cisco cores, and they too drop these UDP packets ( although I have not contacted them to confirm/ask their thoughts, I just know traceroute fails).
Things I can rule out:
1. non-NAT related iptables rules. outside1 runs almost the exact same rule set minus NAT related statements and forwarding.
2. ISPs blocking traceroute udp packets. I can clearly traceroute from other hosts, or from inside those networks to the opposite one.
3. hardware. all my servers use Intel cards, if it was hardware I believe I'd see a lot more issues with udp packets, etc.
Oh, a traceroute -I (for icmp) works fine.
Googling this issue in every way that I can think of returns nothing. Both boxes, with different Kernels, creating the same packets that have bad check sums. Could be something with my NAT setup, but I don't do anything that crazy with it.
Thoughts? Any help, greatly greatly appreciated. I'd also be happy to provide more info if I can.
- a stressed sys-admin.
I cross posted this in the Gentoo forums as well: http://forums.gentoo.org/viewtopic-t-732185.html
|Non-Linux FOSS: libnotify, OS X Style||Jun 18, 2013|
|Containers—Not Virtual Machines—Are the Future Cloud||Jun 17, 2013|
|Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer||Jun 12, 2013|
|Weechat, Irssi's Little Brother||Jun 11, 2013|
|One Tail Just Isn't Enough||Jun 07, 2013|
|Introduction to MapReduce with Hadoop on Linux||Jun 05, 2013|
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Linux Systems Administrator
- Introduction to MapReduce with Hadoop on Linux
- RSS Feeds
- New Products
- Weechat, Irssi's Little Brother
- Validate an E-Mail Address with PHP, the Right Way
- Tech Tip: Really Simple HTTP Server with Python
- Poul-Henning Kamp: welcome to
13 min 2 sec ago
- This has already been done
14 min 2 sec ago
- Reply to comment | Linux Journal
59 min 16 sec ago
- Welcome to 1998
1 hour 47 min ago
- notifier shortcomings
2 hours 11 min ago
3 hours 48 min ago
- Android User
3 hours 49 min ago
- Reply to comment | Linux Journal
5 hours 43 min ago
8 hours 32 min ago
- This is a good post. This
13 hours 45 min ago
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?