Inside the Linux Packet Filter
Let's take a closer look at the netif_rx() function. As mentioned before, this function has the task of receiving a packet from a network driver and queuing it for upper-layer processing. It acts as a single gathering point for all the packets collected by the different network card drivers, providing input to the upper protocols' processing.
Since this function runs in interrupt context (that is, its execution flow follows the interrupt service path) with other interrupts disabled, it has to be quick and short. It cannot perform lengthy checks or other complex tasks since the system is potentially losing packets while netif_rx() runs. So, what this function does is basically select the packet queue from an array called softnet_data, whose index is based on the CPU currently running. It then checks the status of the queue, identifying one of five possible congestion levels: NET_RX_SUCCESS (no congestion), NET_RX_CN_LOW, NET_RX_CN_MOD, NET_RX_CN_HIGH (low, moderate and high congestion, respectively) or NET_RX_DROP (packet dropped due to critical congestion).
Should the critical congestion level be reached, netif_rx() engages a throttling policy that allows the queue to go back to a noncongested status, avoiding service disruption due to kernel overload. Among other benefits, this helps avert possible DOS attacks.
Under normal conditions, the packet is finally queued (__skb_queue_tail()), and __cpu_raise_softirq(cpuid, NET_IF_SOFTIRQ) is called. The latter function has the effect of scheduling a softirq for execution.
The netif_rx() function terminates, returning a value indicating the current congestion level to the caller. At this point, interrupt context processing is done, and the packet is ready to be taken care of by upper-layer protocols. This processing is deferred to a later time, when interrupts will have been re-enabled and execution timing will not be as critical. The deferred execution mechanism has changed radically from kernel versions 2.2 (where it was based on bottom halves) to versions 2.4 (where it is based on softirqs).
Explaining in detail about bottom halves (BHs) and their evolution is out of the scope of this article. But, some points are worth recalling briefly.
First off, their design was based on the principle that the kernel should perform as few computations as possible while in interrupt context. Thus, when long operations were to be done in response to an interrupt, the corresponding driver would mark the appropriate BH for execution, without actually doing anything complex. Then, at a later time, the kernel would have checked the BH mask to determine whether some BHs were marked for execution and execute them before any application-level task.
BHs worked quite well, with one important drawback: due to their structure, their execution was serialized strictly among CPUs. That is, the same BH could not be executed by more than one CPU at the same time. This obviously prevented any kind of kernel parallelism on SMP machines and seriously affected performance. softirqs represent the 2.4-age evolution of BHs and, together with tasklets, belong to the family of kernel software interrupts, pieces of code that can be executed by the kernel when requested, without strict response-time guarantees.
The major difference with respect to BHs is that the same softirq may be run on more than one CPU at a time. Serialization, if required, now must be obtained explicitly by using kernel spinlocks.
softirq's processing core is performed in the do_softirq() routine, located in kernel/softirq.c. This function checks a bit mask, and if the bit corresponding to a given softirq is set, it calls the appropriate handling routine. In the case of NET_RX_SOFTIRQ, the one we are interested in at this time, the relevant function is net_rx_action(), located in net/core/dev.c. The do_softirq() function may get called from three distinct places inside the kernel: do_IRQ(), in kernel/irq.c, which is the generic interrupt handler; system calls' exit point, in kernel/entry.S; and schedule(), in kernel/sched.c, which is the main process scheduling function.
In other words, execution of a softirq may happen either when a hardware interrupt has been processed, when an application-level process invokes a system call or when a new process is scheduled for execution. This way, softirqs are drained frequently enough that none of them will lie waiting for their turn for too long.
The trigger mechanism also was exactly the same for the old-style bottom halves.