The Roadrunner Supercomputer: a Petaflop's No Problem

IBM and Los Alamos National Lab built Roadrunner, the world's fastest supercomputer. It not only reached a petaflop, it beat that by more than 10%. This is the story behind Roadrunner.
Speed = Hybrid Architecture + Software

You may be surprised to learn that Roadrunner was built 100% from commercial parts. The secret formula to its screaming performance involves two key ingredients, namely a new hybrid Cell-Opteron processor architecture and innovative software design. Grice emphasized that Roadrunner “was a large-scale thing, but fundamentally it was about the software”.

Despite that claim, the hardware characteristics remain mind-boggling. Roadrunner is essentially a cluster of clusters of Linux Opteron nodes connected with MPI and a parallel filesystem. It sports 6,562 AMD dual-core Opteron 2210 1.8GHz processors and 12,240 IBM PowerXCell 8i 3.2GHz processors. The Opteron's job is to manage standard processing, such as filesystem I/O; the Cell processors handle mathematically and CPU-intensive tasks. For instance, the Cell's eight vector-engine cores can accomplish acceleration of algorithms, much cooler, faster and cheaper than general-purpose ones. “Most people think [that the Cell processor] is a little bit hard to use and that it's just a game thing”, joked Grice. But, the Cell clearly isn't only for gaming anymore. The Cell processors make each computing node 30 times faster than using Opterons alone.

LANL's White further emphasized the uniqueness of Roadrunner's hybrid architecture, calling it a “hybrid hybrid”, because the Cell processor itself is a hybrid. This is because the Cell has the PPU (PowerPC) core and eight SPUs. Because the PPU is “of modest performance” as the folks at LANL politely say, they needed a core for running code that wouldn't run on the SPUs and improved performance. Thus, the Cells are connected to the Opteron.

The system also carries 98 terabytes of memory, as well as 10,000 InfiniBand and Gigabit Ethernet connections that require 55 miles of fiber optic cabling. 10GbE is used to connect to the 2 petabyes of external storage. The 278 IBM BladeCenter racks take up 5,200 square feet of space.

The machine is composed of a unique tri-blade configuration consisting of one two-socket dual-core Opteron LS21 blade and two dual-socket IBM QS22 Cell blade servers. Although the Opteron cores each are connected to a Cell chip via a dedicated PCIe link, the node-to-node communication is via InfiniBand. Each of the 3,456 tri-blades can perform at 400 Gigaflops (400 billion operations per second).

See Figure 3 for a schematic diagram of the tri-blade.

Figure 3. The hybrid Opteron-Cell architecture is manifested in a tri-blade setup. The tri-blade allows the Opteron to perform standard processing while the Cell performs mathematically and CPU-intensive tasks.

The hybrid, tri-blade architecture has allowed for a quantum leap in the performance while utilizing the same amount of space as previous generations of supercomputers. Roadrunner takes up the same space and costs the same to operate as its two predecessors, the ASC Purple and ASC White machines before it. This is because performance continues to grow predictably at a rate of 1,000% every 10–11 years. Grice noted how just three of Roadrunner's tri-blades have the same power as the fastest computer from 1998. Put another way, a calculation that would take a week on Roadrunner today would be only half finished on an old 1 teraflop machine that was started in 1998.

Such quantum leaps in performance help boggle the minds of many scientists, who see their careers changing right before their eyes. If they have calculations that take too long today, they can be quite sure that in two years, the calculation will take one-tenth of the time.

Neither IBM's Grice nor LANL's White could emphasize enough the importance and complexity of the software that allows for exploitation of Roadrunner's hardware prowess. Because clock frequency and chip power have plateaued, Moore's Law will continue to hold through other means, such as with Roadrunner's hybrid architecture.

Roadrunner Runs

Roadrunner was put together in its full configuration on May 23, 2008. On May 26, it reached the petaflop. “Running a petaflop just three days after being assembled is pretty amazing”, said White.

Clearly a petaflop isn't the limit. Not only was the original petaflop achievement actually 1.026 petaflops, since then, Roadrunner has done better. In June 2008, LANL and IBM ran a project called PetaVision Synthetic Cognition, a model of the brain's visual cortex that mimicked more than one billion brain cells and trillions of synapses. It reached the 1.144 petaflop mark. Calculations like these are the petaflop-level tasks for which Roadrunner is ideal.

“It's hard to overstate how exciting it is to see the science we'll be able to do with Roadrunner”, said White. In mid-2009 the bulk of Roadrunner's nodes will enter “classified” mode for the rest of its life, allowing only authorized personnel to know what it's doing. Nevertheless, scientists and their groupies will be happy to learn about some of Roadrunner's non-military duties. First, in August 2008, LANL ordered two additional connected units for Roadrunner, dubbed the Turquoise Network, which will be available and “in the open all the time”, according to White. These units should be running by October 2008. In addition, during early 2009 before Roadrunner goes classified, LANL will utilize several other so-called unclassified open science codes as test loads as part of Roadrunner's stabilization and integration process. The ten codes that have been selected for this purpose must prove their ability to work on Roadrunner. Although some of these codes are based on the above-mentioned VPIC and SPaSM, others are new and untested. “It remains to be seen whether others can write codes that actually will run on the system”, stated White.

LANL received 29 proposals for access to Roadrunner, of which two were weapons-related and eight were non-weapons-related. A sampling of the fascinating selected projects include investigations of the formation of metallic nanowires with an atomic-force microscope, the phylogenetics of the early infection states of HIV and, finally, dark energy and matter.

Although the chance to utilize Roadrunner's power is enticing, one must consider the extra tweaking to take advantage of the hybrid architecture. “It can be tricky”, said White. With a more conventional machine, codes don't require much change from one Linux cluster to another. Fortunately, for those scientists whose proposals have been accepted, LANL is offering extra funds to support code development to the hybrid architecture. This December, LANL will evaluate the progress of each project and allocate compute time in early 2009 based on those results.

______________________

James Gray is Products Editor for Linux Journal

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

$110 million! And there was

James Artle's picture

$110 million! And there was be debating whether to spend a few thousand on a new laptop! Seriously, all this stuff blows me away; it's hard to imagine how a computer this big and powerful can be built, and why it needs to be this big.

James from Laptop Reviews

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix