HPF: Programming Linux Clusters the Easy Way
In Statement 1, the identical distribution of a and b ensures that for all i, a(i) and b(i) are on the same processor; thus, the compiler does not generate any message passing.
In statement 2, there is again no need for message passing. If the ALIGN statement had lined up x(i) with y(i) rather than y(i+1), communication would have been needed for some values of i.
Statement 3 looks very much like Statement 1; but the communication requirements are very different because of the different distribution of a and c. The array elements a(i) and c(i) are on the same processor for only 10 of the possible values of i, and hence for nearly all of the elements; communication of data between processors is needed. This is an unwise choice of distribution for c, if indeed this statement represents the bulk of the work.
A good choice of distribution and alignment can greatly help efficiency, and that is the point of having the directives. It is much easier to write FORTRAN90 code and embellish it with HPF directives than to write the equivalent message-passing code.
In practice, the steps taken in writing an HPF program are:
Write FORTRAN90 code. Your existing FORTRAN77 code will do in a pinch, but you will get better efficiency by cleaning it up using the newer FORTRAN high-level constructs; tools exist to help this conversion.
Decide how to configure the processors.
Declare one or more templates to act as guides for distributing arrays.
Decide how to distribute and align the arrays onto the template(s).
This process is illustrated in the code shown in Listing 1, which represents a subroutine to solve a set of linear equations. The subroutine is in standard FORTRAN90 and will run happily through any FORTRAN90 compiler, which will treat the HPF directives as comments. The code makes good use of the FORTRAN90 array facilities and has been parallelized by adding just four HPF directives. The resulting HPF code runs well on a Linux PC cluster, provided the size of the problem being solved is large enough to warrant the use of parallelism.
HPF makes life easy for the programmer, by leaving nearly everything to the compiler. So, can the compilers cope? Can you really get parallel efficiency by using HPF? And, can you get useful speedups on networked PCs with relatively high latency communications?
Of course, no compiler can find parallelism where none exists; you need to give it the parallelism in the beginning. Given this, then the answer is yes, current HPF compilers are surprisingly efficient. On a PC cluster connected by Ethernets, the message-passing latency using PVM or MPI is typically around 0.6ms; this translates to “use fairly coarse-grain parallelism if you can and don't expect to use too many PCs.”
Table 1 shows some timings to illustrate what can be achieved. They were taken on a four-PC Linux P100 cluster with 100Mb Ethernet. “Serial” times are those given using the N. A. Software (NASL) FORTRANPlus F90 compiler, release 1.3.57. These times are absent where the code uses HPF extensions (FORALL, EXTRINSIC(HPFSERIAL)) not supported in FORTRAN90 (for some, we timed equivalent FORTRAN90 versions). HPF times used the NASL HPFPlus compiler, release 2.0. Optimization was set “on” for both FORTRAN and HPF. Times are in seconds.
The overheads intrinsic to using HPF rather than FORTRAN are shown by comparing the Serial and P = 1 times. These overheads are quite low—often negligible and, for Gauss, even negative (we see this on other platforms too). The gain in using HPF is shown by comparing the Serial and P = 4 times. Speedups achieved relative to the serial times range from 2.1 to 4.5.
Mike Delves (firstname.lastname@example.org) spent twenty-five years at the University of Liverpool as Professor of Computational Mathematics and Director of the Institute for Advanced Scientific Computation. His research interests included numerical methods and their implementation in high-level languages (successively Algol68, Ada, FORTRAN90 and HPF—parallelism crept increasingly in along the way). He started N.A. Software in 1978 as a hobby and is now full-time chairman; the company currently has 23 employees. Linux represents its biggest single market for FORTRAN and HPF compilers.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- SUSE LLC's SUSE Manager
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- My +1 Sword of Productivity
- Managing Linux Using Puppet
- Non-Linux FOSS: Caffeine!
- Doing for User Space What We Did for Kernel Space
- SuperTuxKart 0.9.2 Released
- Parsing an RSS News Feed with a Bash Script
- Google's SwiftShader Released
- Rogue Wave Software's Zend Server
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide