Improving Server Performance

How to improve your server's performance by offloading a TCP/IP stack from a Linux-based server onto an iNIC.

Looking to improve the performance on your high-end Linux server? Is your Linux system connected to a high-speed network? Are your servers spending too much of their resources on processing the TCP/IP stack and Ethernet frames? These are just some of the problems in today's network environment. These problems arise because TCP/IP traffic on the Internet and on private enterprise networks has been growing dramatically for the past decade, and there is no sign that growth will be slowing down any time soon. The widespread global adoption of the Internet and the development of new networking storage technologies such as iSCSI are driving networking speeds even faster. Although the processors being manufactured today are gaining speed at an astonishing pace, it is likely that network-related growth will continue to outpace the increasing processor speeds, slowing servers from their primary tasks to process network packets.

Intel has developed a prototype that offloads an entire TCP/IP stack from a Linux-based operating system onto an intelligent network interface card (iNIC).

The iNIC Design

The iNIC contains a real-time operating system (RTOS) and an entire TCP/IP version 4 stack. An I/O processor (IOP) on the iNIC processes all of the network packets allowing the host processor to process other tasks. To accomplish this division of labor, a thin layer of logic is needed on the host side to route all the network traffic through the iNIC.

Socket Offload

This technology is based on the intelligent I/O (I2O) architecture, which is already incorporated into the Linux 2.4.x kernel. Figure 1 is an I2O primer for those who are not familiar with I2O. The I2O specification is a message-based communication mechanism between a host operating system (OS) and I/O devices that are sitting on an IOP. The IOP runs an intelligent RTOS (IRTOS), which contains device-driver modules (DDMs) for each connected device. To obtain portability, I2O uses a split-driver model. The specification defines a set of abstract messages for each supported device class (i.e., LAN, tape, disk). The host OS uses the abstracted message layer to communicate with DDMs running on an IOP. The DDM translates these I2O messages to hardware-specific commands. To communicate with an I2O device, the host OS must have a driver that knows how to translate OS device commands to I2O device class commands. This module in the host OS is referred to as operating system module (OSM).

Figure 1. I2O Architecture

The solution is based on an extension of this architecture with the creation of the socket class. Changes were made to the I2O architecture to increase performance and minimize latencies usually associated with a split-driver model.

Figure 2. Socket Offload Architecture

The socket class defines messages needed for communication between the host OS and the DDM. The two drivers, OSM and DDM, communicate over a two-layer communication system. The message layer sets up the communication session, and the transport layer defines how information will be shared. The DDM is composed of two modules: intermediate service module (ISM) and hardware device module (HDM). The ISM provides the full functionality of the TCP/IP version 4 stack. The HDM is the device driver for the iNIC. The socket OSM is unlike any other network device in Linux. Normal network card drivers are protocol-independent and interface with the Linux kernel at the network application program interface (API). The socket OSM, on the other hand, will interface directly below the socket API. This allows the necessary socket services to be offloaded onto the IOP running the socket class ISM. The socket OSM replaces the services that the TCP/IP stack provided to the kernel, thereby providing necessary interfaces to the Linux kernel. It also transmits and converts socket requests and data in the socket offload format to the iNIC running the TCP/IP offload stack.

Socket OSM

The OSM is divided into the following subsystems: user interface, message interface, kernel interface and memory management.

The user interface replaces the af_inet socket layer in the kernel. It provides feedback to the users' programs exactly as the native (non-TCP/IP offloaded) kernel would provide.

The message interface provides the initialization and control of the socket offload system. It translates the user socket requests to the socket messages.

The kernel interface provides kernel services to the OSM. This is the point at which the OSM provides any services normally provided to the kernel by the TCP/IP stack. This subsystem was designed to minimize the modifications needed for the Linux kernel.

The memory management module provides the buffer pools needed for data transfer to and from the user-space applications. Memory management was designed to 1) minimize the number of data copies and DMA requests, 2) minimize the host interrupts, 3) avoid requiring costly physical-virtual address mappings and 4) avoid overhead of dynamic-memory allocation at runtime. Two pools of DMA-capable data buffers are maintained in the OSM. The transmit buffer contains the data headed for the iNIC; the receive buffer receives the data into the kernel from the socket device.

Figure 3. Linux TCP/IP Stack

As shown in Figure 3, Linux network components consist of a layered structure. User-space programmers access network services via sockets, using the functions provided by the Linux socket layer. The socket structure defined in include/linux/net.h forms the basis for the implementation of the socket interface. Below the user layer is the INET socket layer. It manages the communication end points for the IP-based protocols, such as TCP and UDP. This layer is represented by the data structure sock defined in include/net/sock.h. The layer underneath the INET socket layer is determined by the type of socket and may be the UDP or TCP layer or the IP layer directly. Below the IP layer are the network devices, which receive the final packets from the IP layer.

The socket OSM replaces the INET socket layer. All socket-related requests passed from the socket layer are converted into I2O messages, which are passed to the ISM on the IOP.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

The most efficient way to Improve Linux Server Performance

Anonymous's picture

Install TrustLeap G-WAN: it's free.

It does the job with million times less servers than IIS 7.0 .Net C# (same story with Apache PHP or GlassFish Java).

Segmentation Offload

Anonymous's picture

thanks for this useful article.
I'm a linux user and I'm looking at the network device drivers.
I've seen a few NICs support the TCP segmentation offload (as HW function), UFO etc.
But I've not clear if the GSO (Generic Segmentation Offload) can actually improve the network performance.
What do you think about the GSO performance?
Many thanks.

PS: in the end is the GSO a sw support?

Re: Kernel Korner: Improving Server Performance

markhahn's picture

what was the tested configuration? what host MB/CPU/OS? how was the "dumb" nic configured? what's the nature of these "scripts" in the workload?

without mentioning any of these things, the article is mere marketing fluff, and quite disappointing for LJ.