Inside the Linux Packet Filter

In Part I of this two-part series on the Linux Packet Filter, Gianluca describes a packet's journey through the kernel.
The NET_RX softirq

We've seen the packet come in through the network interface and get queued for later processing. Then, we've considered how this processing is resumed by a call to the net_rx_action() function. It's now time to see what this function does. Basically, its operation is pretty simple: it just dequeues the first packet (sk_buff) from the current CPU's queue and runs through the two lists of packet handlers, calling the relevant processing functions.

Some more words are worth spending on those lists and how they are built. The two lists are called ptype_all and ptype_base and contain, respectively, protocol handlers for generic packets and for specific packet types. Protocol handlers register themselves, either at kernel startup time or when a particular socket type is created, declaring which protocol type they can handle; the involved function is dev_add_pack() in net/core/dev.c, which adds a packet type structure (see include/linux/netdevice.h) containing a pointer to the function that will be called when a packet of that type is received. Upon registration, each handler's structure is either put in the ptype_all list (for the ETH_P_ALL type) or hashed into the ptype_base list (for other ETH_P_* types).

So, what the NET_RX softirq does is call in sequence each protocol handler function registered to handle the packet's protocol type. Generic handlers (that is, ptype_all protocols) are called first, regardless of the packet's protocol; specific handlers follow. As we will see, the PF_PACKET protocol is registered in one of the two lists, depending on the socket type chosen by the application. On the other hand, the normal IP handler is registered in the second list, hashed with the key ETH_P_IP.

If the queue contains more than one packet, net_rx_action() loops on the packets until either a maximum number of packets has been processed (netdev_max_backlog) or too much time has been spent here (the time limit is 1 jiffy, i.e., 10ms on most kernels). If net_rx_action() breaks the loop leaving a non-empty queue, the NET_RX_SOFTIRQ is enabled again to allow for the processing to be resumed at a later time.

The IP Packet Handler

The IP protocol receive function, namely ip_rcv() (in net/ipv4/ip_input.c), is pointed to by the packet type structure registered within the kernel at startup time (ip_init(), in net/ipv4/ip_output.c). Obviously, the registered protocol type for IP is ETH_P_IP.

Thus, ip_rcv() gets called from within net_rx_action() during the processing of a softirq, whenever a packet with type ETH_P_IP is dequeued. This function performs all the initial checks on the IP packet, which mainly involve verifying its integrity (IP checksum, IP header fields and minimum significant packet length). If the packet looks correct, ip_rcv_finish() is called. As a side note, the call to this function passes through the Netfilter prerouting control point, which is practically implemented with the NF_HOOK macro.

ip_rcv_finish(), still in ip_input.c, mainly deals with the routing functionality of IP. It checks whether the packet should be forwarded to another machine or if it is destined to the local host. In the former case, routing is performed, and the packet is sent out via the appropriate interface; otherwise, local delivery is performed. All the magic is realized by the ip_route_input() function, called at the very beginning of ip_rcv_finish(), which determines the next processing step by setting the appropriate function pointer in skb->dst->input. In the case of locally bound packets, this pointer is the address of the ip_local_deliver() function. ip_rcv_finish() terminates with a call to skb->dst->input().

At this point, packets definitely are traveling toward the upper-layer protocols. Control is passed to ip_local_deliver(); this function just deals with IP fragments' reassembly (in case the IP datagram is fragmented) and then goes over to the ip_local_deliver_finish() function. Just before calling it, another Netfilter hook (the ip-local-ip) is executed.

The latter is the last call involving IP-level processing; ip_local_deliver_finish() carries out the tasks still pending to complete the upper part of layer 3. IP header data is trimmed so that the packet is ready to be transferred to the layer 4 protocol. A check is done to assess whether the packet belongs to a raw IP socket, in which case the corresponding handler (raw_v4_input()) is called.

Raw IP is a protocol that allows applications to forge and receive their own IP packets directly, without incurring actual layer 4 processing. Its main use is for network tools that need to send particular packets to perform their tasks. Well-known examples of such tools are ping and traceroute, which use raw IP to build packets with specific header values. Another possible application of raw IP is, for example, realizing custom network protocols at the user level (such as RSVP, the resource reservation protocol). Raw IP may be considered a standard equivalent of the PF_PACKET protocol family, just shifted up one open systems interconnection (OSI) level.

Most commonly, though, packets will be headed toward a further kernel protocol handler. In order to determine which one it is, the Protocol field inside the IP header is examined. The method used by the kernel at this point is very similar to the one adopted by the net_rx_action() function; a hash is defined, called inet_protos, which contains all the registered post-IP protocol handlers. The hash key is, of course, derived from the IP header's protocol field. The inet_protos hash is filled in at kernel startup time by inet_init() (in net/ipv4/af_inet.c), which repeatedly calls inet_add_protocol() to register TCP, UDP, ICMP and IGMP handlers (the latter only if multicast is enabled). The complete protocol table is defined in net/ipv4/protocol.c.

For each protocol, a handler function is defined: tcp_v4_rcv(), udp_rcv(), icmp_rcv() and igmp_rcv() are the obvious names corresponding to the above-mentioned protocols. One of these functions is thus called to proceed with packet processing. The function's return value is used to determine whether an ICMP Destination Unreachable message has to be returned to the sender. This is the case when the upper-level protocols do not recognize the packet as belonging to an existing socket. As you will recall from the previous article, one of the issues in sniffing network data was to have a socket able to receive packets independent of their port/address values. Here (and in the just-mentioned *_rcv() functions) is the point where that limitation arises from.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

trackin' phavon

RomeoX's picture

Hi Gianluca Insolvibile,

sorry for write here, but for one reason i can't find an e-mail to you. So, my "question">> if you allow me i can write some musical themes or sound effects for PHAVON game, i look into the official page but i can't find a mailing list or something, i want to participate.

Saludos desde México

wow ...

Anonymous's picture

wow ...

a very lardable article on packets journey in the linux kernel

Gavin Hu's picture

I appreciate this article very much.
It explains the packets journey in the linux kernel in a simple but informative reveals the processing internals of packets receiving.

Thanks a lot!

looking forward to Part III

Re: Inside the Linux Packet Filter

Mahesh's picture

Really very informative.
Thanxs a lot for such nice article.

Linux packet journey revealed!!!!

Balaji Patnala's picture

Hi Gianluca Insolvibile,

This article is the most informative about the packet journey in Linux right from Hardware drivers untill now i came across. I request you to reveal the packet journey details when we register to the TCP/IP stack with NF_HOOK.

Thanks in advance,

Nice article

Rajesh George's picture

Excellent article which reveals the mystery of journey of packets through stack in simple words.

Hats off to the Article

Anonymous's picture

The author's terse statements have cleared the mystery of the lxr code, it beautifully unravels the mystery of the step by step procedure of how the packet is passed in the kernel to the user space.

Hats off to the author.

University of California.
Los Angeles,CA

I would like to add my own

Anonymous's picture

I would like to add my own IP options to the IP packet. Please mention me the steps to do so. Any links for it???

Thanks in advance.

This is an excellent article

Sudhir's picture

This is an excellent article for a Linux beginner like me.
Thank you very much.

Re: Kernel Korner: Inside the Linux Packet Filter

Anonymous's picture

Nice article, I came here searching for Linux Networking Architecture, this article gave quite significant details about that.

Keep it up.


Pls tell me where I can find

Anonymous's picture

Pls tell me where I can find this document. i am eager to reading it.

Re: Kernel Korner: Inside the Linux Packet Filter

Anonymous's picture

It gives really good insight in Networking Stack of Linux.

Thanks buddy.


Re: Kernel Korner: Inside the Linux Packet Filter

Anonymous's picture

very nice article,i am making a protocol analyzer, it helped me a lot...


Re: Kernel Korner: Inside the Linux Packet Filter

Anonymous's picture

I agree with the other posters that this is an excellent article and a welcome journal series. The author's terse but information dense style is a refreshing change from all the fluff out there that aims to pass for technical journal writing. I found particularly useful the references to the relevant kernel source modules. Armed with a sharpened conceptual overview and specific references to source code, both seasoned developer and newbie alike can put this knowledge to real use; whether through real world network application development, or targeted educational research.

More a wish list item than a criticism, I'd like to see a few diagrams modelling and summarizing the excellent overview Insolvibile has sketched for us.

Looking forward to part III-mh

Re: Kernel Korner: Inside the Linux Packet Filter

Anonymous's picture

Great article - Just what I was trying to understand. Thanks.

Re: Kernel Korner: Inside the Linux Packet Filter

Anonymous's picture

I really appreciate this kind of article. Reading this, and following the code on lxr makes it quite clear, how the whole thing works.

Please, keep on, I am really looking forward to next part. Thanks.

Re: Kernel Korner: Inside the Linux Packet Filter

Anonymous's picture

I am working on an filter-based high-traffic packet capturing engine and find this information just more than useful for optimizations.

Thx a lot

I am looking forward to part 3.