Inside the Linux Packet Filter

In Part I of this two-part series on the Linux Packet Filter, Gianluca describes a packet's journey through the kernel.

Network geeks among you may remember my article, “Linux Socket Filter: Sniffing Bytes over the Network”, in the June 2001 issue of LJ, regarding the use of the packet filter built inside the Linux kernel. In that article I provided an overview of the functionality of the packet filter itself; this time, I delve into the depths of the kernel mechanisms that allow the filter to work and share some insights on Linux packet processing internals.

Last Article's Points

In the previous article, some arguments regarding kernel packet processing were raised. It is worthwhile to recall briefly the most important of them:

  • Packet reception is first dealt with at the network card's driver level, more precisely in the interrupt service routine. The service routine looks up the protocol type inside the received frame and queues it appropriately for later processing.

  • During reception and protocol processing, packets might be discarded if the machine is congested. Furthermore, as they travel upward toward user land, packets lose network lower-level information.

  • At the socket level, just before reaching user land, the kernel checks whether an open socket for the given packet exists. If it does not, the packet is discarded.

  • Then the Linux kernel implements a generic-purpose protocol, called PF_PACKET, which allows you to create a socket that receives packets directly from the network card driver. Hence, any other protocols' handling is skipped, and any packets can be received.

  • An Ethernet card usually passes only the packets destined to itself to the kernel, discarding all the others. Nevertheless, it is possible to configure the card in such a way that all the packets flowing through the network are captured, independent of their MAC address (promiscuous mode).

  • Finally, you can attach a filter to a socket, so that only packets matching your filter's rules are accepted and passed to the socket. Combined with PF_PACKET sockets, this mechanism allows you to sniff selected packets efficiently from your LAN.

Even though we built our sniffer using PF_PACKET sockets, the Linux socket filter (LSF) is not limited to those. In fact, the filter also can be used on plain TCP and UDP sockets to filter out unwanted packets—of course, this use of the filter is much less common.

In the following, I sometimes refer either to a socket or to a sock structure. As far as this article is concerned, both forms indicate the same object, and the latter corresponds to the kernel's internal representation of the former. Actually, the kernel holds both a socket structure and a sock structure, but the difference between the two is not relevant here.

Another data structure that will recur quite often is the sk_buff (short for socket buffer), which represents a packet inside the kernel. The structure is arranged in such a way that addition and removal of header and trailer information to the packet data can be done in a relatively inexpensive way: no data actually needs to be copied since everything is done by just shifting pointers.

Before going on, it may be useful to clear up possible ambiguities. Despite having a similar name, the Linux socket filter has a completely different purpose with respect to the Netfilter framework introduced into the kernel in early 2.3 versions. Even if Netfilter allows you to bring packets up to user space and feed them to your programs, the focus there is to handle network address translation (NAT), packet mangling, connection tracking, packet filtering for security purposes and so on. If you just need to sniff packets and filter them according to certain rules, the most straightforward tool is LSF.

Now we are going to follow the trip of a packet from its very ingress into the computer to its delivery to user land at the socket level. We first consider the general case of a plain (i.e., not PF_PACKET) socket. Our analysis at link layer level is based on Ethernet, since this is the most widespread and representative LAN technology. Cases of other link layer technologies do not present significant differences.

Ethernet Card and Lower-Kernel Reception

As we mentioned in the previous article, the Ethernet card is hard-wired with a particular link layer (or MAC) address and is always listening for packets on its interface. When it sees a packet whose MAC address matches either its own address or the link layer broadcast address (i.e., FF:FF:FF:FF:FF:FF for Ethernet) it starts reading it into memory.

Upon completion of packet reception, the network card generates an interrupt request. The interrupt service routine that handles the request is the card driver itself, which runs with interrupts disabled and typically performs the following operations:

  • Allocates a new sk_buff structure, defined in include/linux/skbuff.h, which represents the kernel's view of a packet.

  • Fetches packet data from the card buffer into the freshly allocated sk_buff, possibly using DMA.

  • Invokes netif_rx(), the generic network reception handler.

  • When netif_rx() returns, re-enables interrupts and terminates the service routine.

The netif_rx() function prepares the kernel for the next reception step; it puts the sk_buff into the incoming packets queue for the current CPU and marks the NET_RX softirq (softirq is explained below) for execution via the __cpu_raise_softirq() call. Two points are worth noticing at this stage. First, if the queue is full the packet is discarded and lost forever. Second, we have one queue for each CPU; together with the new deferred kernel processing model (softirqs instead of bottom halves), this allows for concurrent packet reception in SMP machines.

If you want to see a real-world Ethernet driver in action, you can refer to the simple NE 2000 card PCI driver, located in drivers/net/8390.c; the interrupt service routine called ei_interrupt(), calls ei_receive(), which in turn, performs the following procedure:

  • Allocates a new sk_buff structure via the dev_alloc_skb() call.

  • Reads the packet from the card buffer (ei_block_input() call) and sets skb->protocol accordingly.

  • Calls netif_rx().

  • Repeats the procedure for a maximum of ten consecutive packets.

A slightly more complex example is provided by the 3COM driver, located in 3c59x.c, which uses DMA to transfer the packet from the card memory to the sk_buff.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

trackin' phavon

RomeoX's picture

Hi Gianluca Insolvibile,

sorry for write here, but for one reason i can't find an e-mail to you. So, my "question">> if you allow me i can write some musical themes or sound effects for PHAVON game, i look into the official page but i can't find a mailing list or something, i want to participate.

Saludos desde México

wow ...

Anonymous's picture

wow ...

a very lardable article on packets journey in the linux kernel

Gavin Hu's picture

I appreciate this article very much.
It explains the packets journey in the linux kernel in a simple but informative reveals the processing internals of packets receiving.

Thanks a lot!

looking forward to Part III

Re: Inside the Linux Packet Filter

Mahesh's picture

Really very informative.
Thanxs a lot for such nice article.

Linux packet journey revealed!!!!

Balaji Patnala's picture

Hi Gianluca Insolvibile,

This article is the most informative about the packet journey in Linux right from Hardware drivers untill now i came across. I request you to reveal the packet journey details when we register to the TCP/IP stack with NF_HOOK.

Thanks in advance,

Nice article

Rajesh George's picture

Excellent article which reveals the mystery of journey of packets through stack in simple words.

Hats off to the Article

Anonymous's picture

The author's terse statements have cleared the mystery of the lxr code, it beautifully unravels the mystery of the step by step procedure of how the packet is passed in the kernel to the user space.

Hats off to the author.

University of California.
Los Angeles,CA

I would like to add my own

Anonymous's picture

I would like to add my own IP options to the IP packet. Please mention me the steps to do so. Any links for it???

Thanks in advance.

This is an excellent article

Sudhir's picture

This is an excellent article for a Linux beginner like me.
Thank you very much.

Re: Kernel Korner: Inside the Linux Packet Filter

Anonymous's picture

Nice article, I came here searching for Linux Networking Architecture, this article gave quite significant details about that.

Keep it up.


Pls tell me where I can find

Anonymous's picture

Pls tell me where I can find this document. i am eager to reading it.

Re: Kernel Korner: Inside the Linux Packet Filter

Anonymous's picture

It gives really good insight in Networking Stack of Linux.

Thanks buddy.


Re: Kernel Korner: Inside the Linux Packet Filter

Anonymous's picture

very nice article,i am making a protocol analyzer, it helped me a lot...


Re: Kernel Korner: Inside the Linux Packet Filter

Anonymous's picture

I agree with the other posters that this is an excellent article and a welcome journal series. The author's terse but information dense style is a refreshing change from all the fluff out there that aims to pass for technical journal writing. I found particularly useful the references to the relevant kernel source modules. Armed with a sharpened conceptual overview and specific references to source code, both seasoned developer and newbie alike can put this knowledge to real use; whether through real world network application development, or targeted educational research.

More a wish list item than a criticism, I'd like to see a few diagrams modelling and summarizing the excellent overview Insolvibile has sketched for us.

Looking forward to part III-mh

Re: Kernel Korner: Inside the Linux Packet Filter

Anonymous's picture

Great article - Just what I was trying to understand. Thanks.

Re: Kernel Korner: Inside the Linux Packet Filter

Anonymous's picture

I really appreciate this kind of article. Reading this, and following the code on lxr makes it quite clear, how the whole thing works.

Please, keep on, I am really looking forward to next part. Thanks.

Re: Kernel Korner: Inside the Linux Packet Filter

Anonymous's picture

I am working on an filter-based high-traffic packet capturing engine and find this information just more than useful for optimizations.

Thx a lot

I am looking forward to part 3.

Geek Guide
The DevOps Toolbox

Tools and Technologies for Scale and Reliability
by Linux Journal Editor Bill Childers

Get your free copy today

Sponsored by IBM

Upcoming Webinar
8 Signs You're Beyond Cron

Scheduling Crontabs With an Enterprise Scheduler
11am CDT, April 29th
Moderated by Linux Journal Contributor Mike Diehl

Sign up now

Sponsored by Skybot