Improving Server Performance
The embedded system software consists of the messaging layer, TCP/IP stack, device driver and RTOS.
The messaging layer is the portion of the software that takes messages from the OSM, parses them and makes the socket call into the TCP/IP stack. This layer also takes replies from the network stack and sends the appropriate reply to the OSM. To improve performance and minimize the effects of latency inherent in split-driver systems, the messaging layer batches, replies and pipelines requests.
The embedded TCP/IP stack is a zero copy implementation of the BSD 4.2 stack. It provides all of the functionality of a networking stack to the messaging layer. Like all the software that runs on the IOP, the stack has been optimized for running on the Intel 80310 I/O processor chipset with Intel XScale microarchitecture (ARM-architecture compliant). Benchmarks were performed on the TCP/IP stack during optimization to ensure that it would perform well across all sizes of data traffic.
The HDM was written to take advantage of all the offloading capabilities of the NIC hardware. This includes TCP and IP checksums on transmit and receive, segmentation of large TCP packets (larger than 1,500 bytes) and interrupt batching supported by the chip. The NIC silicon chips supported were the Intel 82550 Fast Ethernet Mu.pngunction PCI/CardBus Controller and the Intel 82543GC Gigabit Ethernet Controller.
The RTOS is a proprietary OS that has been designed for the demands of complex I/O operations. This OS is fully I2O-compliant. It was chosen in part because of the willingness of the designers to make modifications to the OS for the prototyping efforts.
As described before, the socket calls made by the application layer are converted into messages that are sent across the PCI bus and to the I/O processor. This embedded system is a complete computer for performing I/O transactions. It consists of a processor, memory, RTOS and a PCI bus. Because it is designed for I/O, it will minimize the effects of context switching. Once a message reaches the IOP, it is parsed. The socket call that was requested by the application is then called on the embedded network stack. A reply message is sent to the OSM once the socket operation is completed.
The benchmark tests that were run using the prototype showed that the offloading of the TCP/IP stack significantly reduced both CPU utilization and the number of interrupts to the host processor. With a heavily loaded machine, the offloaded stack was able to maintain overall network performance and host CPU cycles were able to remain dedicated to the workload applications. In a native machine, the host processors were interrupted far more frequently, and the network application suffered from CPU resource starvation resulting in the network performance degradation.
As the subject of iSCSI (storage over IP by encapsulating SCSI in a TCP/IP packet) starts to heat up, desire for minimizing network overhead will continue to grow. Efforts used in moving the TCP/IP stack to an IOP quickly could shift to providing a full-featured TCP/IP stack at the back end of an intelligent iSCSI adaptor. This would minimize the impact of iSCSI to a Linux platform by making it available via the normal SCSI API. To compete with Fibre Channel, iSCSI must provide comparable performance.
Another future enhancement is that embedded Linux will be used for the RTOS. At the start of this prototyping effort, an Intel i960 RM/RN processor was used, and embedded Linux was not available. Since then, the Intel XScale microarchitecture has been introduced, enabling the adoption of the embedded Linux that is available for Intel StrongARM core. Porting of Linux-based StrongARM Linux to the XScale microarchitecture will be completed by the end of the year.
There were several goals behind this prototype effort: 1) to demonstrate that the enhanced performance achieved by offloading network tasks from the host processor reduces the host processor cycles otherwise consumed by processing of network data, 2) to show that the use of specialized software on the iNIC performs the same networking tasks while maintaining overall network performance and 3) to enable the use of I/O processors to work in conjunction with the host processors to handle the network traffic, thereby maximizing performance of a Linux-based server at minimal cost.
Offloading the TCP/IP protocol to a specialized networking software environment using embedded processors is an effective way of improving system performance. With the advancement of high-speed network deployments and adoption of network storage, TCP/IP will inevitably play an important role.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Evernote is much more...
15 min 17 sec ago - Reply to comment | Linux Journal
9 hours 40 sec ago - Dynamic DNS
9 hours 34 min ago - Reply to comment | Linux Journal
10 hours 33 min ago - Reply to comment | Linux Journal
11 hours 23 min ago - Not free anymore
15 hours 25 min ago - Great
19 hours 12 min ago - Reply to comment | Linux Journal
19 hours 20 min ago - Understanding the Linux Kernel
21 hours 35 min ago - General
1 day 4 min ago





Comments
The most efficient way to Improve Linux Server Performance
Install TrustLeap G-WAN: it's free.
It does the job with million times less servers than IIS 7.0 .Net C# (same story with Apache PHP or GlassFish Java).
http://gwan.ch/
Segmentation Offload
Hi,
thanks for this useful article.
I'm a linux user and I'm looking at the network device drivers.
I've seen a few NICs support the TCP segmentation offload (as HW function), UFO etc.
But I've not clear if the GSO (Generic Segmentation Offload) can actually improve the network performance.
What do you think about the GSO performance?
Many thanks.
PS: in the end is the GSO a sw support?
Re: Kernel Korner: Improving Server Performance
what was the tested configuration? what host MB/CPU/OS? how was the "dumb" nic configured? what's the nature of these "scripts" in the workload?
without mentioning any of these things, the article is mere marketing fluff, and quite disappointing for LJ.