A $7,000 Server Comparison
The story of Linux on non-x86 architectures started in 1994 with a port to the now-abandoned Alpha architecture. Other ports quickly followed, and over the years, Linux has gained support for most desktop and server CPU designs. Today, however, only five CPU architectures are promoted actively by their manufacturers as Linux-compatible. This article explores how entry-level servers based non-x86 designs compare to the current x86 systems in the same price range.
Comparing the x86 server market is usually fairly boring. The market is split into two camps around the AMD Opteron and the Intel Xeon. The differences between the various server models inside each camp are fairly small. Number of expansion slots, disk count and the features of the remote management solution seem to be the only distinctions. Performance and memory capabilities are determined by the CPU and chipset.
Outside the x86 market, the picture changes. To compete with the established x86 solutions and the massive budget Intel can invest into CPU development, IBM, Sun and the Intel Itanium team have to be innovative and take ideas to new heights.
The first member of the x86 architecture was the 16-bit 8086 designed by Intel in 1978. Since then, x86 has come a long way. It was extended to 32-bit with the i386 and more recently to 64-bit with the AMD64/EMT64. Despite these extensions, all x86 designs have remained backward-compatible, and even the newest quad-core Xeons and Opterons still run DOS.
This backward compatibility has allowed the x86 processors to become the standard for desktops and also to dominate the market for smaller servers. It is, however, also the reason for much of the criticism that Intel and AMD receive.
In 1978, ideas like pipelining, out-of-order execution and branch prediction were known but did not influence the design of the x86 instruction set. Today, these features are part of most CPUs, and a lot of effort is required to implement these features. This increases complexity, and in many cases, optimal performance is not possible.
EPIC (Explicitly Parallel Instruction Computing) is the instruction set used in the Intel Itanium processors. EPIC was codeveloped by HP and Intel as the successor to both the HP PA-RISC line and the Intel x86 processors. The development started in 1994, but after delays and missed performance targets, the project's goals have changed dramatically. Although HP has discontinued the PA-RISC and Alpha architectures and is now selling a full range of Itanium-based servers, Intel continued the development of x86-based processors and now positions the Itanium processor only for high-end applications.
The main idea behind EPIC is that the compiler has a much better understanding of the program code than the CPU does. This additional knowledge about the program can be used to optimize the code at compile time rather than during execution. The reduced need for hardware-based optimization results in simpler architecture. However, the decision also requires more effort from compiler designers and leads to some interesting behavior (see The Compiler Issue sidebar).
The Compiler Issue
GCC is the standard compiler for Linux and many other platforms. However, GCC has a long history of being criticized for lack of optimization for non-x86 platforms. This seems to be especially true for the Itanium platform, as EPIC is the newest instruction set and GCC developers had the least amount of time to optimize the compiler. A whitepaper on Intel's Web site describes about a 25% performance gain when simply translating MySQL with the Intel Compiler vs GCC 4.1.
To verify this claim, we recompiled bzip2 and PostgreSQL 7.4.16 on the HP rx2660. The performance gains were impressive—29% for bzip2 and 21% for PostgreSQL. Hopefully, Intel and HP will continue working with the GCC team on improving performance, because adoption of a closed-source compiler by Red Hat and others is unlikely.
CMT, short for Chip Multi-Threading, is only one of the names describing methods for increasing CPU resource utilization. Instead of relying on larger caches or higher clock speed, CMT increases performance by offering multiple execution threads on a single processor.
CMT can be implemented in two variants. The first method is the use of multiple identical cores that are combined in the same physical package. This allows server manufacturers to deliver more processing power per socket and is implemented in all current architectures.
The second type of CMT is allowing one CPU core to execute multiple threads to increase resource utilization. This can be done by providing dedicated resources to each thread or simply by allowing the primary thread full access and limiting the secondary thread to the resources not used by the primary thread. Intel has implemented this feature in many Pentium 4 CPUs under the brand name of HyperThreading. HyperThreading can speed up execution by up to 20%, but workloads that rely heavily on cache sizes (such as the bzip2 compression discussed later in the article) suffer from having HyperThreading enabled.
The T1 processor that Sun is utilizing in the CoolThreads T1000 and T2000 systems uses both CMT concepts. It has eight cores, and each core is capable of executing four simultaneous threads. To combine such a high number of cores on one chip, Sun has chosen to implement very basic cores running at a fairly low clock frequency of 1–1.4GHz. This results in low single-thread execution speed, but Sun is betting on the 32 execution thread to make up for this disadvantage.
- My Childhood in a Cigar Box
- Papa's Got a Brand New NAS
- Applied Expert Systems, Inc.'s CleverView for TCP/IP on Linux
- Panther MPC, Inc.'s Panther Alpha
- Rogue Wave Software's TotalView for HPC and CodeDynamics
- Simplenote, Simply Awesome!
- Debugging Democracy
- Returning Values from Bash Functions
- NethServer: Linux without All That Linux Stuff
- Tech Tip: Really Simple HTTP Server with Python