Improving Server Performance
The embedded system software consists of the messaging layer, TCP/IP stack, device driver and RTOS.
The messaging layer is the portion of the software that takes messages from the OSM, parses them and makes the socket call into the TCP/IP stack. This layer also takes replies from the network stack and sends the appropriate reply to the OSM. To improve performance and minimize the effects of latency inherent in split-driver systems, the messaging layer batches, replies and pipelines requests.
The embedded TCP/IP stack is a zero copy implementation of the BSD 4.2 stack. It provides all of the functionality of a networking stack to the messaging layer. Like all the software that runs on the IOP, the stack has been optimized for running on the Intel 80310 I/O processor chipset with Intel XScale microarchitecture (ARM-architecture compliant). Benchmarks were performed on the TCP/IP stack during optimization to ensure that it would perform well across all sizes of data traffic.
The HDM was written to take advantage of all the offloading capabilities of the NIC hardware. This includes TCP and IP checksums on transmit and receive, segmentation of large TCP packets (larger than 1,500 bytes) and interrupt batching supported by the chip. The NIC silicon chips supported were the Intel 82550 Fast Ethernet Mu.pngunction PCI/CardBus Controller and the Intel 82543GC Gigabit Ethernet Controller.
The RTOS is a proprietary OS that has been designed for the demands of complex I/O operations. This OS is fully I2O-compliant. It was chosen in part because of the willingness of the designers to make modifications to the OS for the prototyping efforts.
As described before, the socket calls made by the application layer are converted into messages that are sent across the PCI bus and to the I/O processor. This embedded system is a complete computer for performing I/O transactions. It consists of a processor, memory, RTOS and a PCI bus. Because it is designed for I/O, it will minimize the effects of context switching. Once a message reaches the IOP, it is parsed. The socket call that was requested by the application is then called on the embedded network stack. A reply message is sent to the OSM once the socket operation is completed.
The benchmark tests that were run using the prototype showed that the offloading of the TCP/IP stack significantly reduced both CPU utilization and the number of interrupts to the host processor. With a heavily loaded machine, the offloaded stack was able to maintain overall network performance and host CPU cycles were able to remain dedicated to the workload applications. In a native machine, the host processors were interrupted far more frequently, and the network application suffered from CPU resource starvation resulting in the network performance degradation.
As the subject of iSCSI (storage over IP by encapsulating SCSI in a TCP/IP packet) starts to heat up, desire for minimizing network overhead will continue to grow. Efforts used in moving the TCP/IP stack to an IOP quickly could shift to providing a full-featured TCP/IP stack at the back end of an intelligent iSCSI adaptor. This would minimize the impact of iSCSI to a Linux platform by making it available via the normal SCSI API. To compete with Fibre Channel, iSCSI must provide comparable performance.
Another future enhancement is that embedded Linux will be used for the RTOS. At the start of this prototyping effort, an Intel i960 RM/RN processor was used, and embedded Linux was not available. Since then, the Intel XScale microarchitecture has been introduced, enabling the adoption of the embedded Linux that is available for Intel StrongARM core. Porting of Linux-based StrongARM Linux to the XScale microarchitecture will be completed by the end of the year.
There were several goals behind this prototype effort: 1) to demonstrate that the enhanced performance achieved by offloading network tasks from the host processor reduces the host processor cycles otherwise consumed by processing of network data, 2) to show that the use of specialized software on the iNIC performs the same networking tasks while maintaining overall network performance and 3) to enable the use of I/O processors to work in conjunction with the host processors to handle the network traffic, thereby maximizing performance of a Linux-based server at minimal cost.
Offloading the TCP/IP protocol to a specialized networking software environment using embedded processors is an effective way of improving system performance. With the advancement of high-speed network deployments and adoption of network storage, TCP/IP will inevitably play an important role.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
- Using Salt Stack and Vagrant for Drupal Development
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- New Products
- Validate an E-Mail Address with PHP, the Right Way
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- New Products
- The Pari Package On Linux
- New Products
- Dart: a New Web Programming Experience
- This is the easiest tutorial
2 hours 50 min ago - Ahh, the Koolaid.
8 hours 29 min ago - git-annex assistant
14 hours 28 min ago - direct cable connection
14 hours 51 min ago - Agreed on AirDroid. With my
15 hours 1 min ago - I just learned this
15 hours 5 min ago - enterprise
15 hours 35 min ago - not living upto the mobile revolution
18 hours 27 min ago - Deceptive Advertising and
19 hours 2 min ago - Let\'s declare that you have
19 hours 3 min ago





Comments
The most efficient way to Improve Linux Server Performance
Install TrustLeap G-WAN: it's free.
It does the job with million times less servers than IIS 7.0 .Net C# (same story with Apache PHP or GlassFish Java).
http://gwan.ch/
Segmentation Offload
Hi,
thanks for this useful article.
I'm a linux user and I'm looking at the network device drivers.
I've seen a few NICs support the TCP segmentation offload (as HW function), UFO etc.
But I've not clear if the GSO (Generic Segmentation Offload) can actually improve the network performance.
What do you think about the GSO performance?
Many thanks.
PS: in the end is the GSO a sw support?
Re: Kernel Korner: Improving Server Performance
what was the tested configuration? what host MB/CPU/OS? how was the "dumb" nic configured? what's the nature of these "scripts" in the workload?
without mentioning any of these things, the article is mere marketing fluff, and quite disappointing for LJ.