HPF: Programming Linux Clusters the Easy Way
In Statement 1, the identical distribution of a and b ensures that for all i, a(i) and b(i) are on the same processor; thus, the compiler does not generate any message passing.
In statement 2, there is again no need for message passing. If the ALIGN statement had lined up x(i) with y(i) rather than y(i+1), communication would have been needed for some values of i.
Statement 3 looks very much like Statement 1; but the communication requirements are very different because of the different distribution of a and c. The array elements a(i) and c(i) are on the same processor for only 10 of the possible values of i, and hence for nearly all of the elements; communication of data between processors is needed. This is an unwise choice of distribution for c, if indeed this statement represents the bulk of the work.
A good choice of distribution and alignment can greatly help efficiency, and that is the point of having the directives. It is much easier to write FORTRAN90 code and embellish it with HPF directives than to write the equivalent message-passing code.
In practice, the steps taken in writing an HPF program are:
Write FORTRAN90 code. Your existing FORTRAN77 code will do in a pinch, but you will get better efficiency by cleaning it up using the newer FORTRAN high-level constructs; tools exist to help this conversion.
Decide how to configure the processors.
Declare one or more templates to act as guides for distributing arrays.
Decide how to distribute and align the arrays onto the template(s).
This process is illustrated in the code shown in Listing 1, which represents a subroutine to solve a set of linear equations. The subroutine is in standard FORTRAN90 and will run happily through any FORTRAN90 compiler, which will treat the HPF directives as comments. The code makes good use of the FORTRAN90 array facilities and has been parallelized by adding just four HPF directives. The resulting HPF code runs well on a Linux PC cluster, provided the size of the problem being solved is large enough to warrant the use of parallelism.
HPF makes life easy for the programmer, by leaving nearly everything to the compiler. So, can the compilers cope? Can you really get parallel efficiency by using HPF? And, can you get useful speedups on networked PCs with relatively high latency communications?
Of course, no compiler can find parallelism where none exists; you need to give it the parallelism in the beginning. Given this, then the answer is yes, current HPF compilers are surprisingly efficient. On a PC cluster connected by Ethernets, the message-passing latency using PVM or MPI is typically around 0.6ms; this translates to “use fairly coarse-grain parallelism if you can and don't expect to use too many PCs.”
Table 1 shows some timings to illustrate what can be achieved. They were taken on a four-PC Linux P100 cluster with 100Mb Ethernet. “Serial” times are those given using the N. A. Software (NASL) FORTRANPlus F90 compiler, release 1.3.57. These times are absent where the code uses HPF extensions (FORALL, EXTRINSIC(HPFSERIAL)) not supported in FORTRAN90 (for some, we timed equivalent FORTRAN90 versions). HPF times used the NASL HPFPlus compiler, release 2.0. Optimization was set “on” for both FORTRAN and HPF. Times are in seconds.
The overheads intrinsic to using HPF rather than FORTRAN are shown by comparing the Serial and P = 1 times. These overheads are quite low—often negligible and, for Gauss, even negative (we see this on other platforms too). The gain in using HPF is shown by comparing the Serial and P = 4 times. Speedups achieved relative to the serial times range from 2.1 to 4.5.
Mike Delves (firstname.lastname@example.org) spent twenty-five years at the University of Liverpool as Professor of Computational Mathematics and Director of the Institute for Advanced Scientific Computation. His research interests included numerical methods and their implementation in high-level languages (successively Algol68, Ada, FORTRAN90 and HPF—parallelism crept increasingly in along the way). He started N.A. Software in 1978 as a hobby and is now full-time chairman; the company currently has 23 employees. Linux represents its biggest single market for FORTRAN and HPF compilers.
Free DevOps eBooks, Videos, and more!
Regardless of where you are in your DevOps process, Linux Journal can help!
We offer here the DEFINITIVE DevOps for Dummies, a mobile Application Development Primer, and advice & help from the expert sources like:
- Linux Journal
- DNSMasq, the Pint-Sized Super Dæmon!
- Localhost DNS Cache
- High-Availability Storage with HA-LVM
- Real-Time Rogue Wireless Access Point Detection with the Raspberry Pi
- Days Between Dates: the Counting
- You're the Boss with UBOS
- The Usability of GNOME
- Linux for Astronomers
- Multitenant Sites
- Many Drives, One Folder