A $7,000 Server Comparison
The story of Linux on non-x86 architectures started in 1994 with a port to the now-abandoned Alpha architecture. Other ports quickly followed, and over the years, Linux has gained support for most desktop and server CPU designs. Today, however, only five CPU architectures are promoted actively by their manufacturers as Linux-compatible. This article explores how entry-level servers based non-x86 designs compare to the current x86 systems in the same price range.
Comparing the x86 server market is usually fairly boring. The market is split into two camps around the AMD Opteron and the Intel Xeon. The differences between the various server models inside each camp are fairly small. Number of expansion slots, disk count and the features of the remote management solution seem to be the only distinctions. Performance and memory capabilities are determined by the CPU and chipset.
Outside the x86 market, the picture changes. To compete with the established x86 solutions and the massive budget Intel can invest into CPU development, IBM, Sun and the Intel Itanium team have to be innovative and take ideas to new heights.
The first member of the x86 architecture was the 16-bit 8086 designed by Intel in 1978. Since then, x86 has come a long way. It was extended to 32-bit with the i386 and more recently to 64-bit with the AMD64/EMT64. Despite these extensions, all x86 designs have remained backward-compatible, and even the newest quad-core Xeons and Opterons still run DOS.
This backward compatibility has allowed the x86 processors to become the standard for desktops and also to dominate the market for smaller servers. It is, however, also the reason for much of the criticism that Intel and AMD receive.
In 1978, ideas like pipelining, out-of-order execution and branch prediction were known but did not influence the design of the x86 instruction set. Today, these features are part of most CPUs, and a lot of effort is required to implement these features. This increases complexity, and in many cases, optimal performance is not possible.
EPIC (Explicitly Parallel Instruction Computing) is the instruction set used in the Intel Itanium processors. EPIC was codeveloped by HP and Intel as the successor to both the HP PA-RISC line and the Intel x86 processors. The development started in 1994, but after delays and missed performance targets, the project's goals have changed dramatically. Although HP has discontinued the PA-RISC and Alpha architectures and is now selling a full range of Itanium-based servers, Intel continued the development of x86-based processors and now positions the Itanium processor only for high-end applications.
The main idea behind EPIC is that the compiler has a much better understanding of the program code than the CPU does. This additional knowledge about the program can be used to optimize the code at compile time rather than during execution. The reduced need for hardware-based optimization results in simpler architecture. However, the decision also requires more effort from compiler designers and leads to some interesting behavior (see The Compiler Issue sidebar).
The Compiler Issue
GCC is the standard compiler for Linux and many other platforms. However, GCC has a long history of being criticized for lack of optimization for non-x86 platforms. This seems to be especially true for the Itanium platform, as EPIC is the newest instruction set and GCC developers had the least amount of time to optimize the compiler. A whitepaper on Intel's Web site describes about a 25% performance gain when simply translating MySQL with the Intel Compiler vs GCC 4.1.
To verify this claim, we recompiled bzip2 and PostgreSQL 7.4.16 on the HP rx2660. The performance gains were impressive—29% for bzip2 and 21% for PostgreSQL. Hopefully, Intel and HP will continue working with the GCC team on improving performance, because adoption of a closed-source compiler by Red Hat and others is unlikely.
CMT, short for Chip Multi-Threading, is only one of the names describing methods for increasing CPU resource utilization. Instead of relying on larger caches or higher clock speed, CMT increases performance by offering multiple execution threads on a single processor.
CMT can be implemented in two variants. The first method is the use of multiple identical cores that are combined in the same physical package. This allows server manufacturers to deliver more processing power per socket and is implemented in all current architectures.
The second type of CMT is allowing one CPU core to execute multiple threads to increase resource utilization. This can be done by providing dedicated resources to each thread or simply by allowing the primary thread full access and limiting the secondary thread to the resources not used by the primary thread. Intel has implemented this feature in many Pentium 4 CPUs under the brand name of HyperThreading. HyperThreading can speed up execution by up to 20%, but workloads that rely heavily on cache sizes (such as the bzip2 compression discussed later in the article) suffer from having HyperThreading enabled.
The T1 processor that Sun is utilizing in the CoolThreads T1000 and T2000 systems uses both CMT concepts. It has eight cores, and each core is capable of executing four simultaneous threads. To combine such a high number of cores on one chip, Sun has chosen to implement very basic cores running at a fairly low clock frequency of 1–1.4GHz. This results in low single-thread execution speed, but Sun is betting on the 32 execution thread to make up for this disadvantage.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- A Topic for Discussion - Open Source Feature-Richness?
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Validate an E-Mail Address with PHP, the Right Way
- RSS Feeds
- Readers' Choice Awards
- Tech Tip: Really Simple HTTP Server with Python
- BASH script to log IPs on public web server
1 hour 58 min ago - DynDNS
5 hours 34 min ago - Reply to comment | Linux Journal
6 hours 6 min ago - All the articles you talked
8 hours 30 min ago - All the articles you talked
8 hours 33 min ago - All the articles you talked
8 hours 34 min ago - myip
12 hours 59 min ago - Keeping track of IP address
14 hours 50 min ago - Roll your own dynamic dns
20 hours 3 min ago - Please correct the URL for Salt Stack's web site
23 hours 15 min ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




Comments
Throughput computing and pbzip2 throughput
I do not really understand your graphs with pbzip2... Are you saying that it takes more time to compress a file with higher parallelism? I think the y-axis is goofed up.
Anyway, for in-depth analysis on throughput computing on T2000 and beyond look at:
http://blogs.sun.com/glennf/category/Throughput+Computing
-Glenn
Really? You compare these
Really? You compare these architectures using Linux? Why not the T1000 with Solaris and IBM with AIX? It seems incomplete to not test with the native OS which these architectures are optimized.
Also, you mention these architectures are "proprietary". While most are, it seems not fair to call Solaris and CMT proprietary. Both the OS and chip architecture are Open Source :)