The OSCAR Revolution
It took nearly a full year, but OSCAR had a beta demonstration at SC2000 in Dallas, Texas at the Oak Ridge National Lab booth in November 2000. The beta was run on a heterogeneous cluster of servers provided by Dell and SGI. The first release was announced shortly thereafter and made a successful debut at LinuxWorld Expo in New York City in February 2001, at the Intel booth. Since then, there have been continuous improvements in the OSCAR software stack, which currently includes:
Linux installation: SIS (system installation suite). SIS is an open-source cluster installation tool based on the merger of LUI (the Linux utility for cluster install) and the popular SystemImager. SIS, developed by Michael Chase-Salerno and Sean Dague from IBM, made its debut in the 1.2.1 version of OSCAR. Most recently, Brian Finley of Bald Guy Software, the creator of SystemImager, has been attending the OSCAR meetings and looking for free beer, as in free beer.
Security: OpenSSH—the most common way to allow secure connections in a Linux environment. OpenSSH is a collection of packages that handles secure connections, server-side SSH services, secure key generation and any other functions used to support secure connections between computers.
Cluster management: for cluster-wide management operations, OSCAR uses the Cluster Command and Control (C3) management package developed at Oak Ridge National Lab by Stephen Scott and Brian Luethke, an East Tennessee State University student working at ORNL. C3 provides a “single-system illusion” so that a single command affects the entire cluster. C3 remains installed on the cluster nodes for later use by cluster users and administrators.
Programming environments: Message-Passing Interface (MPI) and Parallel Virtual Machine (PVM). Most cluster users write the software that runs on the cluster. There are many different ways to write software for clusters. The most common approach is to use a message-passing library. Currently, compilers or math libraries installed by OSCAR come from the Linux distribution. Both LAM/MPI and MPICH have been available since OSCAR 1.1.
Workload management: Portable Batch System (PBS) from Veridian and Maui Scheduler (developed by Maui High Times Computing Center). To time-share a cluster, some type of workload or job management is needed. Maui acts as a job scheduler for OSCAR, making all resource allocation and scheduling decisions. PBS is the job server/launcher and in addition to launching and killing jobs, handles job queues.
MSC.Linux, a distribution developed by the Systems Division of MSC.Software Corporation, is of special importance in the acceptance of OSCAR. Shortly after the 1.0 version of OSCAR was available, MSC.Software announced their own cluster solution, the MSC.Linux Version 2001 operating system. This 2001 offering was in large part based on OSCAR, the first commercial offering based on the work of the OCG. MSC.Software's Joe Griffin added a Webmin interface to LUI (the first OSCAR cluster installation tool), which generated LUI bottom-line commands for multiple nodes to provide an easy-to-use interface in defining the nodes of the cluster and what resources to install on each. One of the original intents of the OCG was that commercial companies would see the value in the open OSCAR software stack and build their own proprietary or open stacks around the OSCAR stack. In so doing, companies using OSCAR would be freed from the mundane chores associated with building a cluster, such as providing the basic infrastructure, and could concentrate instead on more cutting-edge improvements to distinguish their offering.
Like other far-flung open-source projects, it was clear from the beginning that doing the work of the consortium face to face would not always be an option. The travel expense was simply too great, and it was difficult to align so many schedules. To coordinate the work, the group held open weekly phone conferences and would rely on mailing lists and an occasional meeting at a workshop or expo. There were face-to-face “integration parties” held quarterly, one at Intel in Hillsboro, Oregon and another at NCSA in Illinois. But for integrations held between meetings, a new construct was developed, called DIP Day, for distributed integration party. The intent of DIP Days was that everyone working on the project that had a cluster would set aside those days to work on OSCAR, jointly and remotely. Everyone would download the OSCAR package and install and run it, reporting any bugs to the group. On DIP Days, programmers were expected to provide fixes in real time, so that multiple iterations of the code could be tested shortly. Several conference calls with the entire team were held every DIP Day to assess progress and assign new work and priorities. By loosely coordinating the group between DIPs and face-to-face meetings, OSCAR made great strides in reliability and function.
- Let's Go to Mars with Martian Lander
- Applied Expert Systems, Inc.'s CleverView for TCP/IP on Linux
- My Childhood in a Cigar Box
- Papa's Got a Brand New NAS
- Returning Values from Bash Functions
- Tech Tip: Really Simple HTTP Server with Python
- Rogue Wave Software's TotalView for HPC and CodeDynamics
- Panther MPC, Inc.'s Panther Alpha
- Debugging Democracy