Parametric Modelling: Killer Apps for Linux Clusters
EnFuzion belongs somewhere between a user-level tool and a software development environment. EnFuzion has been designed to make it possible to build a fault-tolerant, distributed application for a parametric-modelling experiment in a time on the order of minutes. In many cases, EnFuzion requires no programming; all you do is describe the parameters to the system and give some instruction on how to run the programs. EnFuzion manages the instantiation of your code with different parameter values, sends the files to the remote machines and retrieves them, and finally, it distributes the execution. If a program or system fails, EnFuzion automatically reruns the program on another machine.
Running an EnFuzion experiment requires three phases: preparation, generation and distribution. During preparation, you develop a descriptive file called a plan. A plan contains a description of the parameters, their types and possible values. It also contains commands for sending files to remote machines, retrieving them and running the job. EnFuzion provides a tool called the Preparator which has a wizard for generating standard plan files, as shown in Figure 1. Alternatively, if you are prepared to learn EnFuzion's simple scripting language, it is possible to build a plan using a normal text editor. The plan shown in Figure 1 is typical of a simple experiment and is quite small.
The plan file is processed by a tool called the generator, which asks the user to specify the actual values for the parameters. For example, a plan file might specify that a parameter is a range of integers, without giving the start, end or increment values. The generator tool allows the user to fill in these values. It then reports how many independent program executions are required to perform the experiment by taking the cross product of all parameter values. Figure 2 shows a sample generator interaction with a CAD tool, where the clock period parameter is set to values of 75 and 100 nanoseconds and the cost table parameter is varied from 1 to 100. This interaction generated 200 individual executions of the CAD package.
The generator produces a run file, which contains all the information regarding what parameter values are to be used and how to run the jobs. This file is processed by a tool called the dispatcher, which organizes the actual execution. EnFuzion calls the machine on which you develop your plan the root machine. The work is performed on a number of computational nodes, as shown in Figure 3. Files are sent from the root machine to the computational nodes as required. Output is returned to the root for postprocessing.
The dispatcher chooses to send work to machines which are named in a user-supplied file, so every user can have a different list of machines. The dispatcher contacts the nodes and determines whether it is possible to start execution of the tasks. EnFuzion allows you to restrict the number of tasks run by using many different thresholds, such as a maximum number of tasks, the hours a node will be available and the peak load average for the machine. At Monash, we have augmented EnFuzion with a simple scheme using the UNIX command nice. This allows a node to run more tasks than available processors, but long-running jobs are “niced” to allow short ones to have a higher priority than long-running ones. This seems to be a good way of mixing short- and long-running jobs without restricting the job mix artificially.
At Monash University, we have built a cluster of 30 dual-processor Pentium machines running Linux, called the Monash Parametric Modelling Engine (MPME). A single machine acts as the host, and is connected to both the public network and a private 100Mbit network for linking the computational nodes. Figure 4 shows part of this machine.
Each MPME node runs a full Linux Red Hat 5.2 kernel, and the standard GNU tools, such as gcc, gdb, etc. Typically, users log in to the host and use EnFuzion to schedule work on the nodes. We have configured the system to accept up to five jobs per node, even though each node contains only two processors. To control the load on each machine, a script is run which increases the nice level of each process the longer it executes. This means it is possible to mix long- and short-running jobs on the platform. In contrast, when we limited the number of jobs to the number of processors, we found that long-running jobs were monopolizing the nodes and short-running jobs were rarely run.
To date, we have used the MPME to support a wide range of applications. Table 1 shows a list of the applications mounted during 1998 and 1999. Many of these are student projects completed as part of a semester subject. In the sidebar “A Case Study”, one of our postgraduate students, Carlo Kopp, describes use of the cluster to perform his network simulations. The results have been quite dramatic, in this case the additional computational resources allowed him to explore many more design options than he initially thought would be possible.
David Abramson (firstname.lastname@example.org) is the head of the School of Computer Science and Software Engineering at Monash University in Australia. Professor Abramson has been involved in computer architecture and high performance computing research since 1979 and is currently project leader in the Co-operative Research Centre for Distributed Systems Nimrod Project. His current interests are in high performance computer systems design, software engineering tools for programming parallel and distributed supercomputers and stained glass windows.
Practical Task Scheduling Deployment
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.View Now!
|The Firebird Project's Firebird Relational Database||Jul 29, 2016|
|Stunnel Security for Oracle||Jul 28, 2016|
|SUSE LLC's SUSE Manager||Jul 21, 2016|
|My +1 Sword of Productivity||Jul 20, 2016|
|Non-Linux FOSS: Caffeine!||Jul 19, 2016|
|Murat Yener and Onur Dundar's Expert Android Studio (Wrox)||Jul 18, 2016|
- The Firebird Project's Firebird Relational Database
- Stunnel Security for Oracle
- My +1 Sword of Productivity
- SUSE LLC's SUSE Manager
- Non-Linux FOSS: Caffeine!
- Managing Linux Using Puppet
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- Google's SwiftShader Released
- Parsing an RSS News Feed with a Bash Script
- Doing for User Space What We Did for Kernel Space
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide