Hack and / - Linux Troubleshooting, Part II: Local Network

Last month, I discussed localhost troubleshooting, and this month, I extend troubleshooting to your local network. Find out why shawn can't talk to bill.
Test Local IP Settings

After we have confirmed that shawn is plugged in, the next step is to confirm that eth0 on shawn is configured correctly. To do that, I would use the ifconfig command with eth0 as an argument. I should get back all of the network information I need to determine whether eth0 is set up correctly on shawn:

$ ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:17:42:c0:ff:ee  
          inet addr:10.1.1.9  Bcast:10.1.1.255  Mask:255.255.255.0
          inet6 addr: fe80::217:42ff:fe1f:18be/64 Scope:Link
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:1 errors:0 dropped:0 overruns:0 frame:0
          TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:229 (229.0 B)  TX bytes:2178 (2.1 KB)

There is a lot of output in that command, but the first line I would look at is the second line of output. There I can see that eth0's IP address is 10.1.1.9 and that its subnet mask is 255.255.255.0. If the machine were supposed to have a different IP or subnet mask from what I see here, that potentially could be the cause of the problem. If eth0 didn't have an IP or subnet mask configured at all, I might run ifup eth0 to bring up the interface, or I might look into the local network settings (/etc/network/interfaces on a Debian or Ubuntu machine, /etc/sysconfig/network-scripts/ifcfg-eth0 on a Red Hat-based machine) to see if anything is set incorrectly. If I can't seem to get the interface to come up, and this host gets its IP from DHCP, I might have to move my troubleshooting focus to the DHCP server.

Test the Local Subnet

After you have confirmed that the interface is on the network and should be able to communicate, the next step is to test whether you can access another host on the same subnet—specifically the gateway if you have one configured. Why? Well, if you can't talk to a host on the same subnet, especially if you can't talk to the gateway, there's no point in testing communications with hosts outside of your local subnet. First, I will use the route command to see what gateway is configured, and then I will use ping to see whether I can access the gateway:

$ sudo route -n
Kernel IP routing table
Destination  Gateway   Genmask         Flags Metric Ref  Use Iface
10.1.1.0     *          255.255.255.0   U     0      0     0 eth0
default      10.1.1.1  0.0.0.0          UG    100    0     0 eth0

In this example, I have a very basic routing table, and the line that begins with the word default defines my default gateway: 10.1.1.1. Be sure to use the -n option with route in this step. Without the -n option, route will try to resolve any IP addresses it lists into hostnames. Besides the fact that route will execute faster with -n, if you have network problems, you might not even be able to talk to your DNS server, plus DNS troubleshooting is a topic for another column.

Because I see that the gateway is 10.1.1.1, I would use the ping command to confirm that I can communicate with that gateway:

$ ping -c 5 10.1.1.1
PING 10.1.1.1 (10.1.1.1) 56(84) bytes of data.
64 bytes from 10.1.1.1: icmp_seq=1 ttl=64 time=3.13 ms
64 bytes from 10.1.1.1: icmp_seq=2 ttl=64 time=1.43 ms
64 bytes from 10.1.1.1: icmp_seq=3 ttl=64 time=1.79 ms
64 bytes from 10.1.1.1: icmp_seq=5 ttl=64 time=1.50 ms

--- 10.1.1.1 ping statistics ---
5 packets transmitted, 4 received, 20% packet loss, time 4020ms
rtt min/avg/max/mdev = 1.436/1.966/3.132/0.686 ms

This output tells me that my machine can at least talk with the gateway and presumably with the rest of the 10.1.1.x network. Now, if I couldn't talk to the gateway, that could mean my network administrator is being annoying and blocking ICMP packets. If that's the case, I would just choose another machine on the same subnet (10.1.1.2-10.1.1.254) and try to ping it instead. If I am the network administrator (and therefore not blocking ICMP), or if ICMP isn't being blocked for some other reason, the problem at this phase could be some sort of VLAN issue that I would have to resolve on the network switch itself.

If you run the route command and don't find a default gateway set, you might be tempted to conclude that's the source of the problem. Be careful! That conclusion might be premature. See, if shawn and bill are on the same subnet, I don't need a default gateway configured for those servers to communicate. I'm not going to get into how to calculate subnets in this column, but suffice it to say in my example, if shawn has an IP of 10.1.1.9 and a subnet mask of 255.255.255.0, bill could have an IP of 10.1.1.1 through 10.1.1.254 and be on the same subnet. In that case, I might just ping bill directly. Ideally, I would have a third host on the same subnet I also could ping. That way if bill doesn't respond, but another host on the same subnet responds, I can narrow in on bill as the likely source of the problem.

______________________

Kyle Rankin is a director of engineering operations in the San Francisco Bay Area, the author of a number of books including DevOps Troubleshooting and The Official Ubuntu Server Book, and is a columnist for Linux Journal.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix