Kernel Tuning Gives 40% Gains
The CMOVxx conditional move instructions on the 21264 have been implemented in the hardware by decomposing them into two separate instructions inside the processor. The result of the latency of a CMOVxx instruction is a minimum of two cycles, and can take up to five cycles, depending upon the number of CMOVs in a given fetch block. In some situations, replacing CMOV instructions with highly predictable conditional branches can result in a performance gain on the 21264. Overall, a good rule of thumb is to try to minimize the number of CMOV instructions if possible.
The data was collected from an API NetWorks' CS20 server, which has dual 833MHz processors with 4MB DDR cache, 1GB of SDRAM and Ultra-160 SCSI disks. Two load-generation tests were run: five builds of the 2.2.18 kernel and five builds of gcc-2.95.3. The average system time (as reported by /usr/bin/time -p) was recorded, using various levels of parallelism with make (see Tables 2 and 3).
A similar version of the experiment was run using the 2.4.2 kernel in default mode (all of the performance patches exist). The results were compared to an unpatched 2.4.2 kernel with most (but not all) of the performance changes reverted.
This experiment was initially performed on an API NetWorks' UP1000 motherboard system, which has a 700MHz processor with 4MB cache, 128MB SDRAM and IDE disks. Again, five builds of the kernel and gcc were run, and the average times were recorded. The kernel used was 2.4.0-test6, with and without the patches.
On a modestly configured 21664 system (the UP1000), the performance increase is significant in terms of reducing the amount of time spent in the kernel, with improvements in the 40% range for some activities (kernel builds). On a more generously configured CS20, we consistently attained speed increases of 14-15% for the measured loads.
We attribute the differences between the UP1000 and CS20 systems to be related to their memory: the UP1000 has an 800MB/sec, 64-bit bus, while the CS20 has a 2.65GB/sec, 256-bit bus.
All of the rewritten routines have appeared in one form or another (some have undergone subsequent rewriting) as part of the 2.4.2 kernel. Additionally, we have put together a patch for 2.2.17 of the kernel and made it available on our corporate web site, http://www.api-networks.com/products/downloads/developer_support/ under “Performance”. Through additional efforts, these improvements have also migrated into glibc and will eventually help improve application performance of user-mode code.
Webinar: 8 Signs You’re Beyond Cron
11am CDT, April 29th
Join Linux Journal and Pat Cameron, Director of Automation Technology at HelpSystems, as they discuss the eight primary advantages of moving beyond cron job scheduling. In this webinar, you’ll learn about integrating cron with an enterprise scheduler.Join us!
|Android Candy: Intercoms||Apr 23, 2015|
|"No Reboot" Kernel Patching - And Why You Should Care||Apr 22, 2015|
|Return of the Mac||Apr 20, 2015|
|DevOps: Better Than the Sum of Its Parts||Apr 20, 2015|
|Play for Me, Jarvis||Apr 16, 2015|
|Drupageddon: SQL Injection, Database Abstraction and Hundreds of Thousands of Web Sites||Apr 15, 2015|
- Tips for Optimizing Linux Memory Usage
- "No Reboot" Kernel Patching - And Why You Should Care
- DevOps: Better Than the Sum of Its Parts
- Return of the Mac
- Android Candy: Intercoms
- Drupageddon: SQL Injection, Database Abstraction and Hundreds of Thousands of Web Sites
- Non-Linux FOSS: .NET?
- Play for Me, Jarvis
- diff -u: What's New in Kernel Development