High-Performance Linux Clusters

The present and future of high-performance computing.

Twice a year, a group of scientists in Europe and the United States release a list of the world's 500 most powerful computing systems. The Top 500 list is the most prestigious ranking of its kind, with vendors and users leveraging favorable rankings to promote their work. The most recent list, released June 2007, reconfirmed a recent trend: Linux is by far the most frequently used operating system in high-performance computing (HPC). Consider the numbers: 389 machines (or 78%), run some flavor of Linux, 64 run UNIX, two run Windows and 42 feature a mix of Linux and other operating systems.

Although such dominance suggests that Linux has had a long history in HPC, the truth is that Linux clusters began replacing UNIX systems only six years ago. The reason for such quick initial take-up is due to the fact that Linux and open systems brought commodity hardware and software into what had previously been a proprietary systems market. This change brought costs down significantly, allowing users at the high end to purchase more power at lower cost and opening the door for new users, such as traditional product designers who were not able to afford closed proprietary systems. The domination of Linux in the HPC market is so successful that market research firm IDC estimated Linux represented 65% of the total HPC market by mid-2006 (as compared to approximately 30% for UNIX), with additional growth projected. The Top 500 list confirms that growth.

Challenges and Questions

Linux is clearly the present of HPC, but is it the future? Microsoft continues to make advancements with its Windows Compute Cluster Server, has plenty of cash on hand and is clearly capable, from a business perspective, of eating up market share. In addition, despite its well-known flaws, everyone has worked with and is familiar with Windows, potentially making it a comfortable platform to new HPC users.

Complicating matters further is that, despite their well-earned market dominance, high-performance Linux clusters have, in many cases, earned a reputation for being difficult to build and manage. Widely available commodity components lead to complexity in the selection, integration and testing required when building a stable system. This complexity becomes doubly problematic when you consider that organizations invest in HPC systems in order to get the best possible performance for the applications they run. Small variations in system architecture can have a disproportionately large impact on time to production, system throughput and the price/performance ratio.

Furthermore, like any new technology, the first high-performance Linux clusters hit bumps in the road. Early systems took a very long time for vendors to build and deliver and an even longer time to put into production. Additionally, early management software made re-provisioning systems and upgrading components cumbersome. Finally, delivering HPC systems is as much about understanding the nuances of computer-aided engineering (CAE) applications as it is about understanding technical minutiae related to interconnects, processors and operating systems. Early vendors of high-performance Linux clusters did not necessarily have the expertise in computational fluid dynamics (CFD), finite element analysis (FEA) and visualization codes of proprietary systems vendors.

It is, therefore, natural for many to question whether the tremendous price advantage of Linux and open systems still outweighs all other considerations. The truth is that although Windows provides some advantages to entry-level HPC users, high-performance Linux clusters have matured. Today's Linux clusters deliver better performance at a more attractive price than ever before. Clusters are increasingly being demanded as turnkey systems, allowing faster time to production and fewer management headaches. In addition, the very nature of open source has contributed to the strength of high-performance Linux clustering. Linux clusters adapt more quickly to new technology changes, are easier to modify and optimize and benefit from a worldwide community of developers interested in tweaking and optimizing code.

The Advantages of Linux-Based HPC

The most important factor in HPC is, of course, performance. National laboratories and universities want ever-more powerful machines to solve larger problems with greater fidelity. Aerospace and automotive engineering companies want better performing systems in order to grow from running component-level jobs (such as analyzing the stress on an engine block) to conducting more complex, multi-parameter studies. Product designers in a variety of other fields want to graduate from running CAE applications on their relatively slow workstations in order to accelerate the overall design process.

Performance, therefore, cannot be separated from high-performance computing and in this area, Linux clusters excel. There are two primary reasons for this: maturity and community.