Integrating a Linux Cluster into a Production High-Performance Computing Environment
Overall, OSC has been quite pleased with the two Linux clusters it has had so far, and Linux clusters are seen at the center as one of the main directions in the future of high-performance computing. However, there are numerous areas in which Linux could be improved to support high-performance computing. Probably the most critical of these from OSC's perspective is a parallel filesystem with better parallel performance than NFS. The main use for this would be temporary storage for jobs; this is currently handled on the Brain cluster by having a $TMPDIR directory on each node, but a globally accessible scratch area would be much easier on users. There are currently two potential open-source candidates for a cluster parallel filesystem under Linux: GFS, from the University of Minnesota and PVFS, from Clemson University (see “A Parallel Virtual Filesystem for Linux Clusters”, LJ December 2000). GFS is a journaled, serverless storage-area network (SAN) filesystem over Fibre Channel. It promises to be an excellent performer, and its serverless aspect is quite attractive. However, as of this writing, the GFS code is in a state of flux following a redesign of its locking mechanisms, and the Fibre Channel switches needed for large topologies remain relatively expensive. PVFS, on the other hand, requires no special hardware; it, in effect, implements RAID-0 (striping) across multiple I/O node systems. PVFS's main downside is that it currently has no support for data redudancy, such that if an I/O node fails the parallel filesystem may be corrupted.
Another area where open-source solutions for high-performance computing clusters may be improved is job scheduling and resource management. While PBS has proven to be an adequate framework for resource management, its default scheduling algorithm leaves much to be desired. Luckily PBS was designed to allow third-party schedulers to be plugged into PBS to allow sites to implement their own scheduling policies. One such third-party scheduler is the Maui Scheduler from the Maui High Performance Computing Center. OSC has recently implemented Maui Scheduler on top of PBS and found it to be a dramatic improvement over the default PBS scheduler in terms of job turnaround time and system utilization. However, the documentation for Maui Scheduler is currently a little rough, although Dave Jackson, Maui's principal author, has been quite responsive with our questions.
A third area for work on Linux for high-performance computing is process checkpoint and restart. On Cray systems, the state of a running process can be written to disk and then used to restart the process after a reboot. A similar facility for Linux clusters would be a godsend to cluster administrators; however, for cluster systems using a network like Myrinet, it is quite difficult to implement due to the amount of state information stored in both the MPI implementation and the network hardware itself. Process checkpointing and migration for Linux is supported by a number of software packages such as Condor, from the University of Wisconsin, and MOSIX, from the Hebrew University of Jerusalem (see “MOSIX: a Cluster Load-Balancing Solution for Linux”, LJ May 2001); however, neither of these currently support the checkpointing of an arbitrary MPI process that uses a Myrinet network.
The major question for the future of clustering at OSC is what hardware platform will be used. To date Intel IA32-based systems have been used, primarily due to the wealth of software available. However, both Intel's IA64 and Compaq's Alpha 21264 promise greatly improved floating point performance over IA32. OSC has been experimenting with both IA64 and Alpha hardware, and the current plan is to install a cluster of dual processor SGI Itanium/IA64 systems connected with Myrinet 2000 some time in early 2001. This leads to another question: what do you do with old cluster hardware when they are retired? In the case of the Brain cluster, the plan is to hold a grant competition among research faculty in Ohio to select a number of labs that will receive smaller clusters of nodes from Brain. This would include both the hardware and the software environment, on the condition that idle cycles be usable by other researchers. OSC is also developing a statewide licensing program for commercial clustering software such as Totalview and the Portland Group compilers, to make cluster computing more ubiquitous in the state of Ohio.
This article would not have been possible without help from the author's coworkers who have worked on the OSC Linux clustering project, both past and present: Jim Giuliani, Dave Heisterberg, Doug Johnson and Pete Wyckoff. Doug deserves special mention, as both Pinky and Brain have been his babies in terms of both architecture and administration.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- SUSE LLC's SUSE Manager
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- My +1 Sword of Productivity
- Tech Tip: Really Simple HTTP Server with Python
- Non-Linux FOSS: Caffeine!
- Managing Linux Using Puppet
- Returning Values from Bash Functions
- Rogue Wave Software's Zend Server
- Doing for User Space What We Did for Kernel Space
- Google's SwiftShader Released
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide