Queueing in the Linux Network Stack

Packet queues are a core component of any network stack or device. They allow for asynchronous modules to communicate, increase performance and have the side effect of impacting latency. This article aims to explain where IP packets are queued on the transmit path of the Linux network stack, how interesting new latency-reducing features, such as BQL, operate and how to control buffering for reduced latency.

Figure 1. Simplified High-Level Overview of the Queues on the Transmit Path of the Linux Network Stack

Driver Queue (aka Ring Buffer)

Between the IP stack and the network interface controller (NIC) lies the driver queue. This queue typically is implemented as a first-in, first-out (FIFO) ring buffer (http://en.wikipedia.org/wiki/Circular_buffer)—just think of it as a fixed-sized buffer. The driver queue does not contain the packet data. Instead, it consists of descriptors that point to other data structures called socket kernel buffers (SKBs, http://vger.kernel.org/%7Edavem/skb.html), which hold the packet data and are used throughout the kernel.

Figure 2. Partially Full Driver Queue with Descriptors Pointing to SKBs

The input source for the driver queue is the IP stack that queues IP packets. The packets may be generated locally or received on one NIC to be routed out another when the device is functioning as an IP router. Packets added to the driver queue by the IP stack are dequeued by the hardware driver and sent across a data bus to the NIC hardware for transmission.

The reason the driver queue exists is to ensure that whenever the system has data to transmit it is available to the NIC for immediate transmission. That is, the driver queue gives the IP stack a location to queue data asynchronously from the operation of the hardware. An alternative design would be for the NIC to ask the IP stack for data whenever the physical medium is ready to transmit. Because responding to this request cannot be instantaneous, this design wastes valuable transmission opportunities resulting in lower throughput. The opposite of this design approach would be for the IP stack to wait after a packet is created until the hardware is ready to transmit. This also is not ideal, because the IP stack cannot move on to other work.

Huge Packets from the Stack

Most NICs have a fixed maximum transmission unit (MTU), which is the biggest frame that can be transmitted by the physical media. For Ethernet, the default MTU is 1,500 bytes, but some Ethernet networks support Jumbo Frames (http://en.wikipedia.org/wiki/Jumbo_frame) of up to 9,000 bytes. Inside the IP network stack, the MTU can manifest as a limit on the size of the packets that are sent to the device for transmission. For example, if an application writes 2,000 bytes to a TCP socket, the IP stack needs to create two IP packets to keep the packet size less than or equal to a 1,500 MTU. For large data transfers, the comparably small MTU causes a large number of small packets to be created and transferred through the driver queue.

In order to avoid the overhead associated with a large number of packets on the transmit path, the Linux kernel implements several optimizations: TCP segmentation offload (TSO), UDP fragmentation offload (UFO) and generic segmentation offload (GSO). All of these optimizations allow the IP stack to create packets that are larger than the MTU of the outgoing NIC. For IPv4, packets as large as the IPv4 maximum of 65,536 bytes can be created and queued to the driver queue. In the case of TSO and UFO, the NIC hardware takes responsibility for breaking the single large packet into packets small enough to be transmitted on the physical interface. For NICs without hardware support, GSO performs the same operation in software immediately before queueing to the driver queue.

Recall from earlier that the driver queue contains a fixed number of descriptors that each point to packets of varying sizes. Since TSO, UFO and GSO allow for much larger packets, these optimizations have the side effect of greatly increasing the number of bytes that can be queued in the driver queue. Figure 3 illustrates this concept in contrast with Figure 2.

Figure 3. Large packets can be sent to the NIC when TSO, UFO or GSO are enabled. This can greatly increase the number of bytes in the driver queue.

Although the focus of this article is the transmit path, it is worth noting that Linux has receive-side optimizations that operate similarly to TSO, UFO and GSO and share the goal of reducing per-packet overhead. Specifically, generic receive offload (GRO, http://vger.kernel.org/%7Edavem/cgi-bin/blog.cgi/2010/08/30) allows the NIC driver to combine received packets into a single large packet that is then passed to the IP stack. When the device forwards these large packets, GRO allows the original packets to be reconstructed, which is necessary to maintain the end-to-end nature of the IP packet flow. However, there is one side effect: when the large packet is broken up, it results in several packets for the flow being queued at once. This "micro-burst" of packets can negatively impact inter-flow latency.

______________________

Dan Siemon is a longtime Linux user and former network admin who now spends most of his time doing business stuff.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Nice Sharing on your Blog

Boutique's picture

We are very happy to share my opinion on your blog. We are professionally committed in providing Boutique products to our visitors online including service charges. If you are here then visit our blog.

Interesting article. Your

Anonymous's picture

Interesting article. Your post affects many "burning" issues in our society. We can't be indifferent to these challenges. There are many articles out there on this particular point, but you have captured different sides of the topic. This post gives a lot of awesome information and inspiration. I really enjoyed simply reading.
http://lamisiltablets.info/

I just added this feed to my

Personal VPN Service's picture

I just added this feed to my bookmarks. I have to say, I very much enjoy reading your blogs. Keep it up!

safe flight academy with linux system

lordtn's picture

with safe flight academy : Get quality flying lessons in Tunisia using Flying Schools in Tunisia for advanced pilots and beginner pilot lessons base in linux systems

I found this is an

laurawillson's picture

I found this is an informative and interesting post so i think so it is very useful and knowledgeable. I would like to thank you for the efforts you have made in writing this article http://www.actual-braindumps.com braindumps

IT Network Support

IT Network Support's picture

I am really pleased to read this webpage posts which contains lots of helpful data, thanks for providing such statistics.
IT Network Support

The cotton bag at

Anonymous's picture

The cotton bag at http://www.irisweb.co.uk has a lovely and valuable property. It is feasible to recycle and reuse the cotton bag. It is speedy. You can make use of the cotton bag for numerous significant purposes.

good stuff

lordtn's picture

great work! Thanks for the post and this fantastic website.
tunisie annonce

Comment Spam

the critic's picture

why is it that a supposedly professional website such als LJ can'r provide proper anti-spam measures in their comment system?

Typo in tag

Nigel's picture

I think this should be tagged 'Networking' not 'Netowrking' ?

It is very nice to read this

Biker's picture

It is very nice to read this post. I appreciate your views about this. It helps us a lot. Thank you very much.

Thank you for the effort you

question voyance gratuite's picture

Thank you for the effort you have made ​​in creating this blog, better shared information that's also one of the values ​​of democracy ... if I can do anything to help this site I 'd be happy .. Good luck !

Thanks for the post. Good and

Anonymous's picture

Thanks for the post. Good and succint overview of packet queueing in Linux.

Reply to comment | Linux Journal

logo design's picture

You have published a fantastic website.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix