Heterogeneous Processing: a Strategy for Augmenting Moore's Law

One way to break the high performance computing barrier imposed by the limitations of Moore's Law
The Heterogeneous Model

Heterogeneous computing is the strategy of deploying multiple types of processing elements within a single workflow, and allowing each to perform the tasks to which it is best suited. This model can employ the specialized processors described above (and others) to accelerate some operations up to 100 times faster than what scalar processors can achieve, while expanding the applicability of conventional microprocessor architectures. Because many HPC applications include both code that could benefit from acceleration and code that is better suited for conventional processing, no one type of processor is best for all computations. Heterogeneous processing allows for the right processor type for each operation within a given application.

Traditionally, there have been two primary barriers to widespread adoption of heterogeneous architectures: the programming complexity required to distribute workloads across multiple processors and the additional effort required if those processors are of different types. These issues can be substantial, and any potential advantages of a heterogeneous approach must be weighed against the cost and resources required to overcome them. But today, the rise of multicore systems is already creating a technology discontinuity that will affect the way programmers view HPC software, and open the door to new programming strategies and environments. As software designers become more comfortable programming for multiple processors, they are likely to be more willing to consider other types of architectures, including heterogeneous systems. And several new heterogeneous systems are now emerging.

The Cray X1E supercomputer, for example, incorporates both vector processing and scalar processing, and a specialized compiler that automatically distributes the workload between processors. In the new Cell processor architecture (designed by IBM, Sony and Toshiba to accelerate gaming applications on the new Playstation 3), a conventional processor offloads computationally intensive tasks to synergistic processing elements with direct access to memory. But one of the most exciting areas of heterogeneous computing emerging today employs field programmable gate arrays, or FPGAs.

The FPGA Coprocessor Model

FPGAs are hardware-reconfigurable devices that can be redesigned repeatedly by programmers to solve specific types of problems more efficiently. FPGAs have been used as programmable logic devices for more than a decade, but are now attracting stronger interest as reconfigurable coprocessors. Several pioneering conferences on FPGAs have been held recently in the United States and abroad, and the Ohio Supercomputer Center recently formed the OpenFPGA (www.openfpga.org) initiative to accelerate adoption of FPGAs in HPC and enterprise environments.

There's a reason for this enthusiasm: FPGAs can deliver orders of magnitude improvements over conventional processors on some types of applications. FPGAs allow designers to create a custom instruction set for a given application, and apply hundreds or even thousands of processing elements to an operation simultaneously. For applications that require heavy bit manipulation, adding, multiplication, comparison, convolution or transformation, FPGAs can execute these instructions on thousands of pieces of data at once, with low control overhead and lower power consumption than conventional processors.

FPGAs have had their own historic barriers to widespread adoption. First, they traditionally have been integrated into conventional systems via the PCI bus, which limits their effectiveness like the specialized processors described above. More critically, adapting software to interoperate with FPGAs has been extremely difficult, because FPGAs must be programmed using a Hardware Design Language (HDL). Although these languages are commonplace for electronics designers, they are completely foreign to most HPC system designers, software programmers and users. Today, the tools that will allow software designers to program in familiar ways for FPGAs are just beginning to emerge. Users are also awaiting tools to port existing scalar codes to heterogeneous FPGA coprocessor systems. However, Cray and others are working to eliminate these issues.

The Cray XD1, for example (one of the first commercial HPC systems to use FPGAs as user-programmable accelerators), eliminates many performance limitations by incorporating the FPGA directly into the interconnect and tightly integrating FPGAs into the system's HPC Optimized Linux operating system. New tools also allow users to program for FPGA coprocessor systems with higher-level C-type languages. These include the Celoxica DK Design Suite (a C-to-FPGA compiler that is being integrated with the Cray XD1), Impulse C, Mitrion C and Simulink-to-FPGA from Matlab, which offers a model-based design approach.

Ultimately, as heterogeneous systems incorporating FPGAs become more widely used, we believe they will allow users to solve certain types of problems much faster than anything that will be provided in the near future through Moore's Law, and even support some applications that would not have been possible before. (For an example of the potential of FPGA coprocessor systems, see the sidebar on the Smith-Waterman bioinformatics application.)

______________________

White Paper
Fabric-Based Computing Enables Optimized Hyperscale Data Centers

Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.

Learn More

Sponsored by AMD

White Paper
Red Hat White Paper: Using an Open Source Framework to Catch the Bad Guy

Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6

Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.

Learn more about catching the bad guy in this free white paper.

Learn More

Sponsored by DLT Solutions