Inside the Linux Packet Filter

In Part I of this two-part series on the Linux Packet Filter, Gianluca describes a packet's journey through the kernel.
Network Core Processing

Let's take a closer look at the netif_rx() function. As mentioned before, this function has the task of receiving a packet from a network driver and queuing it for upper-layer processing. It acts as a single gathering point for all the packets collected by the different network card drivers, providing input to the upper protocols' processing.

Since this function runs in interrupt context (that is, its execution flow follows the interrupt service path) with other interrupts disabled, it has to be quick and short. It cannot perform lengthy checks or other complex tasks since the system is potentially losing packets while netif_rx() runs. So, what this function does is basically select the packet queue from an array called softnet_data, whose index is based on the CPU currently running. It then checks the status of the queue, identifying one of five possible congestion levels: NET_RX_SUCCESS (no congestion), NET_RX_CN_LOW, NET_RX_CN_MOD, NET_RX_CN_HIGH (low, moderate and high congestion, respectively) or NET_RX_DROP (packet dropped due to critical congestion).

Should the critical congestion level be reached, netif_rx() engages a throttling policy that allows the queue to go back to a noncongested status, avoiding service disruption due to kernel overload. Among other benefits, this helps avert possible DOS attacks.

Under normal conditions, the packet is finally queued (__skb_queue_tail()), and __cpu_raise_softirq(cpuid, NET_IF_SOFTIRQ) is called. The latter function has the effect of scheduling a softirq for execution.

The netif_rx() function terminates, returning a value indicating the current congestion level to the caller. At this point, interrupt context processing is done, and the packet is ready to be taken care of by upper-layer protocols. This processing is deferred to a later time, when interrupts will have been re-enabled and execution timing will not be as critical. The deferred execution mechanism has changed radically from kernel versions 2.2 (where it was based on bottom halves) to versions 2.4 (where it is based on softirqs).

softirqs vs. Bottom Halves

Explaining in detail about bottom halves (BHs) and their evolution is out of the scope of this article. But, some points are worth recalling briefly.

First off, their design was based on the principle that the kernel should perform as few computations as possible while in interrupt context. Thus, when long operations were to be done in response to an interrupt, the corresponding driver would mark the appropriate BH for execution, without actually doing anything complex. Then, at a later time, the kernel would have checked the BH mask to determine whether some BHs were marked for execution and execute them before any application-level task.

BHs worked quite well, with one important drawback: due to their structure, their execution was serialized strictly among CPUs. That is, the same BH could not be executed by more than one CPU at the same time. This obviously prevented any kind of kernel parallelism on SMP machines and seriously affected performance. softirqs represent the 2.4-age evolution of BHs and, together with tasklets, belong to the family of kernel software interrupts, pieces of code that can be executed by the kernel when requested, without strict response-time guarantees.

The major difference with respect to BHs is that the same softirq may be run on more than one CPU at a time. Serialization, if required, now must be obtained explicitly by using kernel spinlocks.

softirq's Internals

softirq's processing core is performed in the do_softirq() routine, located in kernel/softirq.c. This function checks a bit mask, and if the bit corresponding to a given softirq is set, it calls the appropriate handling routine. In the case of NET_RX_SOFTIRQ, the one we are interested in at this time, the relevant function is net_rx_action(), located in net/core/dev.c. The do_softirq() function may get called from three distinct places inside the kernel: do_IRQ(), in kernel/irq.c, which is the generic interrupt handler; system calls' exit point, in kernel/entry.S; and schedule(), in kernel/sched.c, which is the main process scheduling function.

In other words, execution of a softirq may happen either when a hardware interrupt has been processed, when an application-level process invokes a system call or when a new process is scheduled for execution. This way, softirqs are drained frequently enough that none of them will lie waiting for their turn for too long.

The trigger mechanism also was exactly the same for the old-style bottom halves.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

trackin' phavon

RomeoX's picture

Hi Gianluca Insolvibile,

sorry for write here, but for one reason i can't find an e-mail to you. So, my "question">> if you allow me i can write some musical themes or sound effects for PHAVON game, i look into the official page but i can't find a mailing list or something, i want to participate.

Saludos desde México

wow ...

Anonymous's picture

wow ...

a very lardable article on packets journey in the linux kernel

Gavin Hu's picture

I appreciate this article very much.
It explains the packets journey in the linux kernel in a simple but informative reveals the processing internals of packets receiving.

Thanks a lot!

looking forward to Part III

Re: Inside the Linux Packet Filter

Mahesh's picture

Really very informative.
Thanxs a lot for such nice article.

Linux packet journey revealed!!!!

Balaji Patnala's picture

Hi Gianluca Insolvibile,

This article is the most informative about the packet journey in Linux right from Hardware drivers untill now i came across. I request you to reveal the packet journey details when we register to the TCP/IP stack with NF_HOOK.

Thanks in advance,

Nice article

Rajesh George's picture

Excellent article which reveals the mystery of journey of packets through stack in simple words.

Hats off to the Article

Anonymous's picture

The author's terse statements have cleared the mystery of the lxr code, it beautifully unravels the mystery of the step by step procedure of how the packet is passed in the kernel to the user space.

Hats off to the author.

University of California.
Los Angeles,CA

I would like to add my own

Anonymous's picture

I would like to add my own IP options to the IP packet. Please mention me the steps to do so. Any links for it???

Thanks in advance.

This is an excellent article

Sudhir's picture

This is an excellent article for a Linux beginner like me.
Thank you very much.

Re: Kernel Korner: Inside the Linux Packet Filter

Anonymous's picture

Nice article, I came here searching for Linux Networking Architecture, this article gave quite significant details about that.

Keep it up.


Pls tell me where I can find

Anonymous's picture

Pls tell me where I can find this document. i am eager to reading it.

Re: Kernel Korner: Inside the Linux Packet Filter

Anonymous's picture

It gives really good insight in Networking Stack of Linux.

Thanks buddy.


Re: Kernel Korner: Inside the Linux Packet Filter

Anonymous's picture

very nice article,i am making a protocol analyzer, it helped me a lot...


Re: Kernel Korner: Inside the Linux Packet Filter

Anonymous's picture

I agree with the other posters that this is an excellent article and a welcome journal series. The author's terse but information dense style is a refreshing change from all the fluff out there that aims to pass for technical journal writing. I found particularly useful the references to the relevant kernel source modules. Armed with a sharpened conceptual overview and specific references to source code, both seasoned developer and newbie alike can put this knowledge to real use; whether through real world network application development, or targeted educational research.

More a wish list item than a criticism, I'd like to see a few diagrams modelling and summarizing the excellent overview Insolvibile has sketched for us.

Looking forward to part III-mh

Re: Kernel Korner: Inside the Linux Packet Filter

Anonymous's picture

Great article - Just what I was trying to understand. Thanks.

Re: Kernel Korner: Inside the Linux Packet Filter

Anonymous's picture

I really appreciate this kind of article. Reading this, and following the code on lxr makes it quite clear, how the whole thing works.

Please, keep on, I am really looking forward to next part. Thanks.

Re: Kernel Korner: Inside the Linux Packet Filter

Anonymous's picture

I am working on an filter-based high-traffic packet capturing engine and find this information just more than useful for optimizations.

Thx a lot

I am looking forward to part 3.