Improving Server Performance

How to improve your server's performance by offloading a TCP/IP stack from a Linux-based server onto an iNIC.
Embedded Target

The embedded system software consists of the messaging layer, TCP/IP stack, device driver and RTOS.

The messaging layer is the portion of the software that takes messages from the OSM, parses them and makes the socket call into the TCP/IP stack. This layer also takes replies from the network stack and sends the appropriate reply to the OSM. To improve performance and minimize the effects of latency inherent in split-driver systems, the messaging layer batches, replies and pipelines requests.

The embedded TCP/IP stack is a zero copy implementation of the BSD 4.2 stack. It provides all of the functionality of a networking stack to the messaging layer. Like all the software that runs on the IOP, the stack has been optimized for running on the Intel 80310 I/O processor chipset with Intel XScale microarchitecture (ARM-architecture compliant). Benchmarks were performed on the TCP/IP stack during optimization to ensure that it would perform well across all sizes of data traffic.

The HDM was written to take advantage of all the offloading capabilities of the NIC hardware. This includes TCP and IP checksums on transmit and receive, segmentation of large TCP packets (larger than 1,500 bytes) and interrupt batching supported by the chip. The NIC silicon chips supported were the Intel 82550 Fast Ethernet Mu.pngunction PCI/CardBus Controller and the Intel 82543GC Gigabit Ethernet Controller.

The RTOS is a proprietary OS that has been designed for the demands of complex I/O operations. This OS is fully I2O-compliant. It was chosen in part because of the willingness of the designers to make modifications to the OS for the prototyping efforts.

As described before, the socket calls made by the application layer are converted into messages that are sent across the PCI bus and to the I/O processor. This embedded system is a complete computer for performing I/O transactions. It consists of a processor, memory, RTOS and a PCI bus. Because it is designed for I/O, it will minimize the effects of context switching. Once a message reaches the IOP, it is parsed. The socket call that was requested by the application is then called on the embedded network stack. A reply message is sent to the OSM once the socket operation is completed.

Benchmark Results

The benchmark tests that were run using the prototype showed that the offloading of the TCP/IP stack significantly reduced both CPU utilization and the number of interrupts to the host processor. With a heavily loaded machine, the offloaded stack was able to maintain overall network performance and host CPU cycles were able to remain dedicated to the workload applications. In a native machine, the host processors were interrupted far more frequently, and the network application suffered from CPU resource starvation resulting in the network performance degradation.

Figure 4. Benchmark Results

Future Direction

As the subject of iSCSI (storage over IP by encapsulating SCSI in a TCP/IP packet) starts to heat up, desire for minimizing network overhead will continue to grow. Efforts used in moving the TCP/IP stack to an IOP quickly could shift to providing a full-featured TCP/IP stack at the back end of an intelligent iSCSI adaptor. This would minimize the impact of iSCSI to a Linux platform by making it available via the normal SCSI API. To compete with Fibre Channel, iSCSI must provide comparable performance.

Another future enhancement is that embedded Linux will be used for the RTOS. At the start of this prototyping effort, an Intel i960 RM/RN processor was used, and embedded Linux was not available. Since then, the Intel XScale microarchitecture has been introduced, enabling the adoption of the embedded Linux that is available for Intel StrongARM core. Porting of Linux-based StrongARM Linux to the XScale microarchitecture will be completed by the end of the year.

There were several goals behind this prototype effort: 1) to demonstrate that the enhanced performance achieved by offloading network tasks from the host processor reduces the host processor cycles otherwise consumed by processing of network data, 2) to show that the use of specialized software on the iNIC performs the same networking tasks while maintaining overall network performance and 3) to enable the use of I/O processors to work in conjunction with the host processors to handle the network traffic, thereby maximizing performance of a Linux-based server at minimal cost.

Offloading the TCP/IP protocol to a specialized networking software environment using embedded processors is an effective way of improving system performance. With the advancement of high-speed network deployments and adoption of network storage, TCP/IP will inevitably play an important role.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

The most efficient way to Improve Linux Server Performance

Anonymous's picture

Install TrustLeap G-WAN: it's free.

It does the job with million times less servers than IIS 7.0 .Net C# (same story with Apache PHP or GlassFish Java).

http://gwan.ch/

Segmentation Offload

Anonymous's picture

Hi,
thanks for this useful article.
I'm a linux user and I'm looking at the network device drivers.
I've seen a few NICs support the TCP segmentation offload (as HW function), UFO etc.
But I've not clear if the GSO (Generic Segmentation Offload) can actually improve the network performance.
What do you think about the GSO performance?
Many thanks.

PS: in the end is the GSO a sw support?

Re: Kernel Korner: Improving Server Performance

markhahn's picture

what was the tested configuration? what host MB/CPU/OS? how was the "dumb" nic configured? what's the nature of these "scripts" in the workload?

without mentioning any of these things, the article is mere marketing fluff, and quite disappointing for LJ.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix