A $7,000 Server Comparison

Tired of x86? See what Linux on Itanium, Sun T1 or POWER5 can do!

The story of Linux on non-x86 architectures started in 1994 with a port to the now-abandoned Alpha architecture. Other ports quickly followed, and over the years, Linux has gained support for most desktop and server CPU designs. Today, however, only five CPU architectures are promoted actively by their manufacturers as Linux-compatible. This article explores how entry-level servers based non-x86 designs compare to the current x86 systems in the same price range.

CPU Architectures

Comparing the x86 server market is usually fairly boring. The market is split into two camps around the AMD Opteron and the Intel Xeon. The differences between the various server models inside each camp are fairly small. Number of expansion slots, disk count and the features of the remote management solution seem to be the only distinctions. Performance and memory capabilities are determined by the CPU and chipset.

Outside the x86 market, the picture changes. To compete with the established x86 solutions and the massive budget Intel can invest into CPU development, IBM, Sun and the Intel Itanium team have to be innovative and take ideas to new heights.

King of the Hill—x86

The first member of the x86 architecture was the 16-bit 8086 designed by Intel in 1978. Since then, x86 has come a long way. It was extended to 32-bit with the i386 and more recently to 64-bit with the AMD64/EMT64. Despite these extensions, all x86 designs have remained backward-compatible, and even the newest quad-core Xeons and Opterons still run DOS.

This backward compatibility has allowed the x86 processors to become the standard for desktops and also to dominate the market for smaller servers. It is, however, also the reason for much of the criticism that Intel and AMD receive.

In 1978, ideas like pipelining, out-of-order execution and branch prediction were known but did not influence the design of the x86 instruction set. Today, these features are part of most CPUs, and a lot of effort is required to implement these features. This increases complexity, and in many cases, optimal performance is not possible.

The EPIC Story of Itanium

EPIC (Explicitly Parallel Instruction Computing) is the instruction set used in the Intel Itanium processors. EPIC was codeveloped by HP and Intel as the successor to both the HP PA-RISC line and the Intel x86 processors. The development started in 1994, but after delays and missed performance targets, the project's goals have changed dramatically. Although HP has discontinued the PA-RISC and Alpha architectures and is now selling a full range of Itanium-based servers, Intel continued the development of x86-based processors and now positions the Itanium processor only for high-end applications.

The main idea behind EPIC is that the compiler has a much better understanding of the program code than the CPU does. This additional knowledge about the program can be used to optimize the code at compile time rather than during execution. The reduced need for hardware-based optimization results in simpler architecture. However, the decision also requires more effort from compiler designers and leads to some interesting behavior (see The Compiler Issue sidebar).

CMT—Sun T1

CMT, short for Chip Multi-Threading, is only one of the names describing methods for increasing CPU resource utilization. Instead of relying on larger caches or higher clock speed, CMT increases performance by offering multiple execution threads on a single processor.

CMT can be implemented in two variants. The first method is the use of multiple identical cores that are combined in the same physical package. This allows server manufacturers to deliver more processing power per socket and is implemented in all current architectures.

The second type of CMT is allowing one CPU core to execute multiple threads to increase resource utilization. This can be done by providing dedicated resources to each thread or simply by allowing the primary thread full access and limiting the secondary thread to the resources not used by the primary thread. Intel has implemented this feature in many Pentium 4 CPUs under the brand name of HyperThreading. HyperThreading can speed up execution by up to 20%, but workloads that rely heavily on cache sizes (such as the bzip2 compression discussed later in the article) suffer from having HyperThreading enabled.

The T1 processor that Sun is utilizing in the CoolThreads T1000 and T2000 systems uses both CMT concepts. It has eight cores, and each core is capable of executing four simultaneous threads. To combine such a high number of cores on one chip, Sun has chosen to implement very basic cores running at a fairly low clock frequency of 1–1.4GHz. This results in low single-thread execution speed, but Sun is betting on the 32 execution thread to make up for this disadvantage.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Throughput computing and pbzip2 throughput

Glenn Fawcett's picture

I do not really understand your graphs with pbzip2... Are you saying that it takes more time to compress a file with higher parallelism? I think the y-axis is goofed up.

Anyway, for in-depth analysis on throughput computing on T2000 and beyond look at:



Really? You compare these

Anonymous's picture

Really? You compare these architectures using Linux? Why not the T1000 with Solaris and IBM with AIX? It seems incomplete to not test with the native OS which these architectures are optimized.

Also, you mention these architectures are "proprietary". While most are, it seems not fair to call Solaris and CMT proprietary. Both the OS and chip architecture are Open Source :)