IP Masquerading with Linux
Not all protocols work with IP masquerading. ICMP messages (such as those used by ping) will not be passed through the masquerading host. Also, application protocols that pass their address to the receiving host will not work. The talk program is an example of this.
A major exception to the applications that don't work is ftp. The IP masquerading software has been written to handle file transfers as of kernel version 1.3.37. FTP clients, under normal operation, will send the server the address and port number to which the server should connect for a transfer. This shouldn't work with masquerading for the same reasons that talk fails. However, the IP masquerading software will intercept the FTP PORT command and masquerade as the client host awaiting for the server to connect to it.
The biggest problem is the most subtle one: IP fragmentation. Fragmentation occurs automatically within the Internet Protocol. IP always wants to fit a datagram in the frame size of the network link it is transmitting over. Most data links define a Maximum Transmission Unit (MTU) of information that will fit within one frame. If the IP datagram to be sent out can't all fit into the MTU size of the frame, it will be fragmented.
An IP datagram carrying a TCP segment is structured like the “Original Datagram” illustration in Figure 3. After fragmentation, the new datagrams appear (also shown in Figure 3). The most important aspect to notice is the placement of the TCP header. With fragmentation, it only appears in the first fragment and not in succeeding ones. Without the header, the host doing the masquerading has no way of determining whether the fragment should be forwarded. The same applies for fragmented UDP packets.
With TCP, this problem is mostly avoided because of TCP's MSS (Maximum Segment Size) negotiation. That's not to say it won't happen, but it doesn't occur most of the time. UDP, however, is much more susceptible to this type of behavior. Your only solution as an administrator is to be careful about controlling MTU sizes on SLIP or PPP networks.
Other problems also exist for X applications (connections back to the X server); RealAudio (patches available, however); and rlogin (rlogind requires a privileged port).
Actual troubleshooting of masquerading problems is not always as easy as getting the rules straight. One subscriber to the IP masquerading mailing list (see Sidebar) presented an interesting problem. It was solved with simple analysis, code knowledge, and a good hex editor.
Greg Priem sent a message to the IP masquerading mailing list describing a problem in which his telnet sessions would freeze. He isolated a sequence of events that reproduced the problem—he would log into his service provider's main host from a machine behind his Linux box and type in ls -l.
Greg did some initial analysis and posted what he found. The network he was using is illustrated in Figure 4. The telnets were from the Mac to the ISP and other hosts on the Internet. He noticed telnets from the Mac to the Linux Box worked fine, as well as telnets from the Linux Box to the ISP.
Output from tcpdump revealed fragmentation was taking place. I followed up with a message indicating a possible problem and asked Greg to check the MTU sizes on each interface of the Linux Box.
I thought it strange that fragmentation was occurring on a telnet session since telnet uses TCP. As mentioned before, when TCP opens a connection, the MSS negotiation is supposed to eliminate fragmentation.
Further debugging with tcpdump (a handy program) showed the MTU assigned by the ISP was 212. To try to eliminate fragmentation, the SLIP link was also assigned an MTU of 212 by Greg. When looking at the MSS negotiation of the connections, Greg found that from the Linux box to the ISP, the MSS was set to 172, and from the Mac to the Linux box it was the same. However, a connection from the Mac to the ISP showed an MSS of 536.
Given that information, I was able to deduce the problem and respond with an appropriate solution.
The connection scenarios are given in Figure 5.
One thing to note was the MSS advertisement of 536 from the Mac when it had an immediate link with an MTU smaller than that. BSD-experienced people will remember this number from the networking code that chose an MSS value for TCP's negotiation by seeing if the destination was on the local LAN or a remote LAN. The code roughly looked like:
if dest_net == local_net then mss = (link MTU) - 40 else mss = 536 /* determined by 576 - 40 */ fi
If the destination was on a remote network, it would set the MSS automatically to 536. This was a good number because the RFC for IP stated that the default datagram size for internetworking is 576, meaning every device should be able to handle it without further fragmentation. Forty is subtracted to allow for IP and TCP headers.
A second thing to notice was the Linux box forwarding the MSS advertisement. One might think that since a connection is being made from the Linux box as a consequence of masquerading, the MSS value would be based on the network link from the Linux box and not the original value from the sending host.
As an aside, there was the one unexplainable instance of connections made to the ISP host and the ISP sending back an MSS of 1460, as shown at the bottom of Figure 5. It's strange because it was also connected to the PPP link with an MTU of 212. This may be attributed to a lack of knowledge on the ISP's side of the network.
Since both sides were using an MSS value greater than the MTU of either link, there was bound to be fragmentation, even for a TCP connection. Under normal circumstances, this wouldn't matter, but it does confuse masquerading.
The simple solution was to have the ISP support an MTU of at least 576 and for Greg to set the SLIP link with an MTU of 576 or greater. Therefore, no fragmentation would occur.
Greg e-mailed his ISP and waited for an answer. When none arrived he became impatient. Since he didn't have the source to the TCP code on the Mac, the only way to look at it was with a hex editor. He started poking around to see if he could find the BSD-like code where it made the decision for the MSS, and sure enough, he found it. He changed the hard coded values of 536 to 172 (i.e. 212-40), restarted his Mac, and lo and behold, it worked—no more fragmentation! (By the way, the ISP did change the MTU size later.) His approach was a little more daring than what I would have done, but it seems to be the nature of Linux users to patch an existing binary if they can't recompile something.