The OSCAR Revolution
“Serve no whine before its time” is a bad pun attributed to Rob Pennington of NCSA at the very first OSCAR meeting, held in April 2000 at a hotel a stone's throw from Oak Ridge National Lab. A varied cast representing the national labs, academia and industry was assembled to discuss what was known at the time as the CCDK (Community Cluster Development Kit), which would morph into the OCG (Open Cluster Group) and their first project, OSCAR (the Open Source Cluster Application Resource).
The cast had broken clusters down into components and had assigned “czars” (leaders) and “whiners” (interested parties) for each component. The czars were to lead each component group, and the whiners were to whine loudly and often enough to make sure things got done on schedule, meeting the group's requirements. From that very first meeting when the czars and whiners were named, it was clear that OSCAR development would be different from all other software development that had gone before. After all, where else would one find companies like IBM, Dell, SGI and Intel working closely together to produce open solutions in a hotly contested space like clustering?
The original idea for OSCAR came about over dinner at a DOE-sponsored cluster meeting at Argonne National Lab, where Dr. Timothy Mattson, a research scientist at Intel, and Dr. Stephen Scott, a research scientist at Oak Ridge National Lab, discussed the problem of getting Linux clusters accepted into the mainstream. The problem, they decided, was that it was just too difficult for noncomputer programmers to assemble their own cluster. Books like How to Build a Beowulf (Sterling, et. al.) would help the computer savvy understand the concepts and construct his or her first cluster, but there were still daunting problems. There was an enormous amount of code to download, all at differing levels of reliability, support, integration and documentation. Sometimes the documentation for various packages was dated and contradictory. There were many Linux distributions to choose from, each trying to distinguish themselves by being slightly different from the next distribution. This meant that some commands worked differently or that different packages had to be installed to get a service to work properly.
The problem, they decided, was that with everyone trying to build their own cluster to tap into cheap cluster computing, each cluster was being built from scratch. There had to be some economy in compiling the best available software, practices and documentation in a single spot, integrating the package on different types of hardware and making it available to users for free (as in free beer). This concept, making clusters easy to build for the nonprogrammer, is a central tenet of OSCAR.
The historic first meeting in Oak Ridge was attended by Tim Mattson and Stephen Scott, the leaders of the OCG; Gabriel Bonner from SGI; Dave Lombard from MSC.Software; Rob Pennington of NCSA; Greg Lindahl, now of Conservative Computers; Ken Briskey and myself from IBM; Greg Astfalk from HP; and Clay Taylor from MPI Software Technologies. Shortly after the first meeting, Broahn Mann from Veridian joined to bring his parallel scheduling skill to the team, as did Jeremy Enos and Neil Gorsuch from NCSA (who implemented SSH on OSCAR) and Mike Brim from Oak Ridge National Lab (who wrote most of the integration scripts and packaging). Most recently, Jeff Squyres and Brian Barrett from Indiana University joined the OSCAR Project representing LAM/MPI. The disparate group agreed on three major core principles:
That the adoption of clusters for mainstream, high-performance computing is inhibited by a lack of well-accepted software stacks that are robust and easy to use by the general user.
That the group embraces the open-source model of software distribution. Anything contributed to the group must be freely distributable, preferably as source code under the Berkeley open-source license.
That the group can accomplish its goals by propagating best-known practices built up through many years of hard work by cluster computing pioneers.
With these principles firmly in place, the group used a divide-and-conquer method to list the components that comprise clusters. The component groups decided on the best-known, open-source solutions for each component and presented the information to the group at large. Taken collectively, these best-known practices for each component comprised a viable cluster solution. Even with the component solutions in hand, there was a massive and time-consuming integration effort by Oak Ridge National Lab, led by Mike Brim and Brian Luethke, and a separate test effort, which was led by Jenwei Hsieh, Tau Leng and Yung-Chin Fang from Dell. Through their efforts, and face-to-face and remote-integration parties, OSCAR eventually morphed into something to share with the rest of the community.