Packet capture is one of the most fundamental and powerful ways to do network analysis. You can learn virtually anything about what is going on within a network by intercepting and examining the raw data that crosses it. Modern network analysis tools are able to capture, interpret and describe this network traffic in a human-friendly manner.
tcpdump is one of the original packet capture (or "sniffing") tools that provide these analysis capabilities, and even though it now shares the field with many other utilities, it remains one of the most powerful and flexible.
If you think that tcpdump has been made obsolete by GUI tools like Wireshark, think again. Wireshark is a great application; it's just not the right tool for the job in every situation. As a refined, universal, lightweight command-line utility—much like cat, less and hexdump—tcpdump satisfies a different type of need.
One of tcpdump's greatest strengths is its convenience. It uses a "one-off-command" approach that lends itself to quick, on-the-spot answers. It works through an SSH session, doesn't need X and is more likely to be there when you need it. And, because it uses standard command-line conventions (such as writing to STDOUT, which can be redirected), tcpdump can be used in all sorts of creative, interesting and extremely useful ways.
In this article, I introduce some of the basics of packet capture and provide a breakdown of tcpdump syntax and usage. I show how to use tcpdump to zero in on specific packets and reveal the useful information they contain. I provide some real-world examples of how tcpdump can help put the details of what's happening on your network at your fingertips, and why tcpdump is still a must-have in any admin's toolbox.
Before you can begin to master tcpdump, you should understand some of the fundamentals that apply to using all packet sniffers:
Packet capturing is passive—it doesn't transmit or alter network traffic.
You can capture only the packets that your system receives. On a typical switched network, that excludes unicast traffic between other hosts (packets not sent to or from your machine).
You can capture only packets addressed to your system, unless the network interface is in promiscuous mode.
It is assumed that you're interested in seeing more than just your local traffic, so tcpdump turns on promiscuous mode automatically (which requires root privileges). But, in order for your network card to receive the packets in the first place, you still have to be where the traffic is, so to speak.
Anatomy of a tcpdump Command
A tcpdump command consists of two parts: a set of options followed by a filter expression (Figure 1).
Figure 1. Example tcpdump Command
The expression identifies which packets to capture, and the options define, in part, how those packets are displayed as well as other aspects of program behavior.
tcpdump options follow the standard command-line flag/switch syntax
conventions. Some flags accept a parameter, such as
-i to specify the
capture interface, while others are standalone switches and can be
clustered, such as
-v to increase verbosity and
-n to turn off name
The man page for tcpdump lists all available options, but here are a few of the noteworthy ones:
-i interface: interface to listen on.
-vvv: more verbose.
-q: less verbose.
-e: print link-level (Ethernet) headers.
-N: display relative hostnames.
-t: don't print timestamps.
-n: disable name lookups.
-s 0): use the max "snaplen"—capture full packets (default in recent versions of tcpdump).
None of these are required. User-supplied options simply modify the default program behavior, which is to capture from the first interface, and then print descriptions of matching packets on the screen in a single-line format.
The filter expression is the Boolean (true or false) criteria for "matching" packets. All packets that do not match the expression are ignored.
The filter expression syntax is robust and flexible. It consists
primarily of keywords called primitives, which represent
various packet-matching qualifiers, such as protocol, address, port and
direction. These can be chained together with
or, grouped and
nested with parentheses, and negated with
not to achieve virtually
Because the primitives have friendly names and do a lot of the
heavy lifting, filter expressions are generally self-explanatory and
easy to read and construct. The syntax is fully described in the
pcap-filter man page, but here are a few example filter expressions:
port 25 and not host 10.0.0.3
icmp or arp or udp
vlan 3 and ether src host aa:bb:cc:dd:ee:ff
arp or udp port 53
icmp and \(dst host mrorange or dst host mrbrown\)
Like the options, filter expressions are not required. An empty filter expression simply matches all packets.
Understanding tcpdump Output
How much sense the output makes depends on how well you understand the protocols in question. tcpdump tailors its output to match the protocol(s) of the given packet.
For example, ARP packets are displayed like this when tcpdump is called
-n (timestamps and name lookups turned off):
arp who-has 10.0.0.1 tell 10.0.0.2 arp reply 10.0.0.1 is-at 00:01:02:03:04:05
ARP is a simple protocol used to resolve IPs into MAC addresses. As you can see above, tcpdump describes these packets in a correspondingly simple format. DNS packets, on the other hand, are displayed completely different:
IP 10.0.0.2.50435 > 10.0.0.1.53: 19+ A? linuxjournal.com. (34) IP 10.0.0.1.53 > 10.0.0.2.50435: 19 1/0/0 A 184.108.40.206 (50)
This may seem cryptic at first, but it makes more sense when you understand how protocol layers work. DNS is a more complicated protocol than ARP to begin with, but it also operates on a higher layer. This means it runs over top of other lower-level protocols, which also are displayed in the output.
Unlike ARP, which is a non-routable, layer-3 protocol, DNS is an Internet-wide protocol. It relies on UDP and IP to carry and route it across the Internet, which makes it a layer-5 protocol (UDP is layer-4, and IP is layer-3).
The underlying UDP/IP information, consisting of the source and destination IP/port, is displayed on the left side of the colon, followed by the remaining DNS-specific information on the right.
Even though this DNS information still is displayed in a highly condensed format, you should be able to recognize the essential elements if you know the basics of DNS. The first packet is a query for linuxjournal.com, and the second packet is an answer, giving the address 220.127.116.11. These are the kind of packets that are generated from simple DNS lookups.
See the "OUTPUT FORMAT" section of the tcpdump man page for complete descriptions of all the supported protocol-specific output formats. Some protocols are better served than others by their output format, but I've found that tcpdump does a pretty good job in general of showing the most useful information about a given protocol.
In addition to its normal behavior of printing packet descriptions to
the screen, tcpdump also supports a mode of operation where it writes
packets to a file instead. This mode is activated when the
-w option is
used to specify an output capture file.
When writing to a file, tcpdump uses a completely different format from when it writes to the screen. When writing to the screen, formatted text descriptions of packets are printed. When writing to a file, the raw packets are recorded as is, without analysis.
Instead of doing a live capture, tcpdump also can read from an existing
capture file as input with the
-r option. Because tcpdump capture files
use the universally supported "pcap" format, they also can be opened
by other applications, including Wireshark.
This gives you the option to capture packets with tcpdump on one host, but perform analysis on a different host by transferring and loading the capture file. This lets you use Wireshark on your local workstation without having to attach it to the network and location you need to capture from.
Analyzing TCP-Based Application Protocols
tcpdump is a packet-based analyzer, and it works great for connectionless, packet-based protocols like IP, UDP, DHCP, DNS and ICMP. However, it cannot directly analyze "connection-oriented" protocols, such as HTTP, SMTP and IMAP, because they work completely different.
They do not have the concept of "packets". Instead, they operate over the stream-based connections of TCP, which provide an abstracted communications layer. These application protocols are really more like interactive console programs than packet-based network protocols.
TCP transparently handles all of the underlying details required to provide these reliable, end-to-end, session-style connections. This includes encapsulating the stream-based data into packets (called segments) that can be sent across the network. All of these details are hidden below the application layer.
In order to capture TCP-based application protocols, an extra step is needed beyond capturing packets. Because each TCP segment is only a slice of application data, it can't be used individually to obtain any meaningful information. You first must reassemble TCP sessions (or flows) from the combined sets of individual segments/packets. The application protocol data is contained directly within the sessions.
tcpdump doesn't have an option to assemble TCP sessions from packets directly, but you can "fake" it by using what I call "the tcpdump strings trick".
The tcpdump Strings Trick
Usually when I'm capturing traffic, it's just for the purpose of casual analysis. The data doesn't need to be perfect if it shows me what I'm looking for and helps me gain some insight.
In these cases, speed and convenience reign supreme. The following trick is along these lines and is one of my favorite tcpdump techniques. It works because:
TCP segments usually are sent in chronological order.
Text-based application protocols produce TCP segments with text payloads.
The data surrounding the text payloads, such as packet headers, is usually not text.
The UNIX command
stringsfilters out binary data from streams preserving only text (printable characters).
When tcpdump is called with
-w -it prints raw packets to STDOUT.
Put it all together, and you get a command that dumps real-time HTTP session data:
tcpdump -l -s0 -w - tcp dst port 80 | strings
-l option above turns on line buffering, which makes sure data gets
printed to the screen right away.
What is happening here is tcpdump is printing the raw, binary data to the
screen. This uses a twist on the
-w option where the
writes to STDOUT instead of a file. Normally, doing this would display
all kinds of gibberish, but that's where the
strings command comes
in—it allows only data recognized as text through to the screen.
There are few caveats to be aware of. First, data from multiple sessions received simultaneously is displayed simultaneously, clobbering your output. The more refined you make the filter expression, the less of a problem this will be. You also should run separate commands (in separate shells) for the client and server side of a session:
tcpdump -l -s0 -w - tcp dst port 80 | strings tcpdump -l -s0 -w - tcp src port 80 | strings
Also, you should expect to see a few gibberish characters here and
there whenever a sequence of binary data also happens to look like text
characters. You can cut down on this by increasing
min-len (see the
strings man page).
This trick works just as well for other text-based protocols.
HTTP and SMTP Analysis
Using the strings trick in the previous section, you can capture HTTP data even though tcpdump doesn't actually understand anything about it. You then can "analyze" it further in any number of ways.
If you wanted to see all the Web sites being accessed by "davepc" in real time, for example, you could run this command on the firewall (assume the internal interface is eth1):
tcpdump -i eth1 -l -s0 -w - host davepc and port 80 \ | strings | grep 'GET\|Host'
In this example, I'm using a simple grep command to display only lines with GET or Host. These strings show up in HTTP requests and together show the URLs being accessed.
This works just as well for SMTP. You could run this on your mail server to watch e-mail senders and recipients:
tcpdump -l -s0 -w - tcp dst port 25 | strings \ | grep -i 'MAIL FROM\|RCPT TO'
These are just a few silly examples to illustrate what's possible. You obviously could take it beyond grep. You could go as far as to write a Perl script to do all sorts of sophisticated things. You probably wouldn't take that too far, however, because at that point, there are better tools that actually are designed to do that sort of thing.
The real value of tcpdump is the ability to do these kinds of things interactively and on a whim. It's the power to look inside any aspect of your network whenever you want without a lot of effort.
Debugging Routes and VPN Links
tcpdump is really handy when debugging VPNs and other network connections by showing where packets are showing up and where they aren't. Let's say you've set up a standard routable network-to-network VPN between 10.0.50.0/24 and 192.168.5.0/24 (Figure 2).
Figure 2. Example VPN Topology
If it's operating properly, hosts from either network should be able to ping one another. However, if you are not getting replies when pinging host D from host A, for instance, you can use tcpdump to zero in on exactly where the breakdown is occurring:
tcpdump -tn icmp and host 10.0.50.2
In this example, during a ping from 10.0.50.2 to 192.168.5.38, each round trip should show up as a pair of packets like the following, regardless of from which of the four systems the tcpdump command is run:
IP 10.0.50.2 > 192.168.5.38: ICMP echo request, ↪id 46687, seq 1, length 64 IP 192.168.5.38 > 10.0.50.2: ICMP echo reply, ↪id 46687, seq 1, length 64
If the request packets make it to host C (the remote gateway) but not to D, this indicates that the VPN itself is working, but there could be a routing problem. If host D receives the request but doesn't generate a reply, it probably has ICMP blocked. If it does generate a reply but it doesn't make it back to C, then D might not have the right gateway configured to get back to 10.0.50.0/24.
Using tcpdump, you can follow the ping through all eight possible points of failure as it makes its way across the network and back again.
I hope this article has piqued your interest in tcpdump and given you some new ideas. Hopefully, you also enjoyed the examples that barely scratch the surface of what's possible.
Besides its many built-in features and options, as I showed in several examples, tcpdump can be used as a packet-data conduit by piping into other commands to expand the possibilities further—even if you do manage to exhaust its extensive "advertised" capabilities. The ways to use tcpdump are limited only by your imagination.
tcpdump also is an incredible learning tool. There is no better way to learn how networks and protocols work than from watching their actual packets.
Don't forget to check out the tcpdump and pcap-filter man pages for additional details and information.
The tcpdump/libpcap Legacy
tcpdump has been the de facto packet capture tool for the past 25 years. It really did spawn the whole genre of network utilities based on sniffing and analyzing packets. Prior to tcpdump, packet capture had such high processing demands that it was largely impractical. tcpdump introduced some key innovations and optimizations that helped make packet capture more viable, both for regular systems and for networks with a lot of traffic.
The utilities that came along afterward not only followed tcpdump's lead, but also directly incorporated its packet capture functionality. This was possible because very early on, tcpdump's authors decided to move the packet capture code into a separate portable library called libpcap.
Wireshark, ntop, snort, iftop, ngrep and hundreds of other applications and utilities available today are all based on libpcap. Even most packet capture applications for Windows are based on a port of libpcap called WinPcap.
tcpdump and libpcap: http://www.tcpdump.org
TCP/IP Model: http://en.wikipedia.org/wiki/TCP/IP_model
Limited Time Offer
Take Linux Journal for a test drive. Download our September issue for FREE.
Topic of the Week
The cloud has become synonymous with all things data storage. It additionally equates to the many web-centric services accessing that same back-end data storage, but the term also has evolved to mean so much more.