Parametric Modelling: Killer Apps for Linux Clusters
After nearly 20 years of research and development, there still has not been a wide-scale uptake of parallel computing technology. However, with recent advances in PC-based hardware and networking products, it is now possible to build a parallel computer from industry-standard, commercial, off-the-shelf products. Approaches such as those used in the Beowulf project (see http://beowulf.gsfc.nasa.gov/) advocate the connection of a number of Linux-based workstations, or high-end PCs, with high-speed networking hardware and appropriate software as the basis for a viable parallel computer. These systems support parallel programming using a message-passing methodology. In spite of this, there is still a dearth of good, widespread, parallel applications.
I believe there are two main reasons for the lack of applications. First, rapid changes in hardware architecture and a lack of software standards have made it difficult to develop robust, portable and efficient software. Second, much of the effort in developing a parallel program has been focused on low-level programming issues, such as how to support message passing in a portable manner. Consequently, insufficient attention has been given to high-level parallel programming environments built around niche application domains. Thus, application developers building parallel programs have been forced to consider very low-level issues, which are far removed from their base disciplines such as physics, chemistry and commerce.
In 1994, some of my colleagues and I began a research project called Nimrod, with the goal of producing a parallel problem-solving environment for a particular niche class of application, namely parametric modelling experiments. We were motivated to do the work for this project after experiencing the frustration of trying to perform some large-scale computational-modelling experiments on a distributed computer using the available tools. We were either forced to perform the work manually, or use low-level parallel programming tools like PVM (Parallel Virtual Machine) and batch queueing systems. Neither of these automated methods matched the type of work we wanted to perform, which at the time was modelling air pollution and exploring different control strategies. The project led to the development of a commercial tool called EnFuzion (see http://www.turbolinux.com/), which runs on a variety of UNIX platforms including Linux. It has also led to a number of very successful applications, with demonstrated returns to the researchers involved. Other types of problems can be formulated as parametric experiments, and thus can also take advantage of EnFuzion.
Parametric modelling is concerned with computing a function value with different input parameter values. The results make it possible to explore different parameters and design options. The broad approach has been made very popular by the use of spreadsheet software, in which many different options can be rapidly evaluated and the results displayed. However, rather than simply computing a function value, we are interested in running an executable program, and thus the time required to explore a number of design scenarios may be extensive. For example, a financial model may take on the order of one hour to compute one scenario on a PC. Accordingly, to consider 100 different scenarios requires 100 hours, or over four days of computing time. This type of experiment can be feasible only if a number of computers are used to run different invocations of the program in parallel. For example, 20 machines could solve the same problem in about five hours of elapsed time.
In spite of the obvious advantage to using multiple machines to solve one large parametric experiment, almost no high-level tools are available for supporting multiple executions of the same program over distributed computers. Spreadsheet software is not designed for executing programs concurrently or utilizing distributed computers. It is possible to write a program using low-level parallel programming tools like PVM (see http://www.epm.ornl.gov/pvm/), Message Passing Interface (see http://www.mpi-forum.org/) and Remote Procedure Call (RPC), to support its execution across many machines; however, this approach has a number of disadvantages.
First, the source must be available for the program, and it must be modified to support parallel execution. This is not always possible or desirable for complex commercial software.
Second, all aspects of distributing the task must be built into the application. Application programmers are usually not expert in both their own discipline and parallel programming; in short, it is difficult.
Third, unless fault tolerance is built into the application, the resulting program will not be able to handle failure in the network or in individual machines. Such fault tolerance significantly complicates the application.
Accordingly, it is not surprising that few instances of this approach are used in practice.
The alternative method is to use a remote job distribution system like Platform Computing Load Sharing Facility (see www.platform.com), Network Queueing System (see www.shef.ac.uk/uni/projects/nqs) or Portable Batch Scheduler (see pbs.mrj.com). These systems allow a user to submit jobs to a virtual computer built from many individual machines. While more general than the previous parallel-programming approach, the main disadvantage of remote queueing systems is they are targeted at running jobs, not necessarily performing a parametric study. Thus, you have to build additional software for running generated jobs based on the different parameter values and after aggregating the results. This can expose the user to a number of low-level system issues, such as the management of network queues, location and nature of the underlying machines, availability of shared file systems, the method for transferring data from one machine to another, etc.
Fast/Flexible Linux OS Recovery
On Demand Now
In this live one-hour webinar, learn how to enhance your existing backup strategies for complete disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible full-system recovery solution for UNIX and Linux systems.
Join Linux Journal's Shawn Powers and David Huffman, President/CEO, Storix, Inc.
Free to Linux Journal readers.Register Now!
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Client-Side Performance
- Peppermint 7 Released
- Libarchive Security Flaw Discovered
- Sony Settles in Linux Battle
- Maru OS Brings Debian to Your Phone
- Git 2.9 Released
- Snappy Moves to New Platforms
- The Giant Zero, Part 0.x
- Profiles and RC Files
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide