HPF: Programming Linux Clusters the Easy Way
In Statement 1, the identical distribution of a and b ensures that for all i, a(i) and b(i) are on the same processor; thus, the compiler does not generate any message passing.
In statement 2, there is again no need for message passing. If the ALIGN statement had lined up x(i) with y(i) rather than y(i+1), communication would have been needed for some values of i.
Statement 3 looks very much like Statement 1; but the communication requirements are very different because of the different distribution of a and c. The array elements a(i) and c(i) are on the same processor for only 10 of the possible values of i, and hence for nearly all of the elements; communication of data between processors is needed. This is an unwise choice of distribution for c, if indeed this statement represents the bulk of the work.
A good choice of distribution and alignment can greatly help efficiency, and that is the point of having the directives. It is much easier to write FORTRAN90 code and embellish it with HPF directives than to write the equivalent message-passing code.
In practice, the steps taken in writing an HPF program are:
Write FORTRAN90 code. Your existing FORTRAN77 code will do in a pinch, but you will get better efficiency by cleaning it up using the newer FORTRAN high-level constructs; tools exist to help this conversion.
Decide how to configure the processors.
Declare one or more templates to act as guides for distributing arrays.
Decide how to distribute and align the arrays onto the template(s).
This process is illustrated in the code shown in Listing 1, which represents a subroutine to solve a set of linear equations. The subroutine is in standard FORTRAN90 and will run happily through any FORTRAN90 compiler, which will treat the HPF directives as comments. The code makes good use of the FORTRAN90 array facilities and has been parallelized by adding just four HPF directives. The resulting HPF code runs well on a Linux PC cluster, provided the size of the problem being solved is large enough to warrant the use of parallelism.
HPF makes life easy for the programmer, by leaving nearly everything to the compiler. So, can the compilers cope? Can you really get parallel efficiency by using HPF? And, can you get useful speedups on networked PCs with relatively high latency communications?
Of course, no compiler can find parallelism where none exists; you need to give it the parallelism in the beginning. Given this, then the answer is yes, current HPF compilers are surprisingly efficient. On a PC cluster connected by Ethernets, the message-passing latency using PVM or MPI is typically around 0.6ms; this translates to “use fairly coarse-grain parallelism if you can and don't expect to use too many PCs.”
Table 1 shows some timings to illustrate what can be achieved. They were taken on a four-PC Linux P100 cluster with 100Mb Ethernet. “Serial” times are those given using the N. A. Software (NASL) FORTRANPlus F90 compiler, release 1.3.57. These times are absent where the code uses HPF extensions (FORALL, EXTRINSIC(HPFSERIAL)) not supported in FORTRAN90 (for some, we timed equivalent FORTRAN90 versions). HPF times used the NASL HPFPlus compiler, release 2.0. Optimization was set “on” for both FORTRAN and HPF. Times are in seconds.
The overheads intrinsic to using HPF rather than FORTRAN are shown by comparing the Serial and P = 1 times. These overheads are quite low—often negligible and, for Gauss, even negative (we see this on other platforms too). The gain in using HPF is shown by comparing the Serial and P = 4 times. Speedups achieved relative to the serial times range from 2.1 to 4.5.
Mike Delves (email@example.com) spent twenty-five years at the University of Liverpool as Professor of Computational Mathematics and Director of the Institute for Advanced Scientific Computation. His research interests included numerical methods and their implementation in high-level languages (successively Algol68, Ada, FORTRAN90 and HPF—parallelism crept increasingly in along the way). He started N.A. Software in 1978 as a hobby and is now full-time chairman; the company currently has 23 employees. Linux represents its biggest single market for FORTRAN and HPF compilers.
Pick up any e-commerce web or mobile app today, and you’ll be holding a mashup of interconnected applications and services from a variety of different providers. For instance, when you connect to Amazon’s e-commerce app, cookies, tags and pixels that are monitored by solutions like Exact Target, BazaarVoice, Bing, Shopzilla, Liveramp and Google Tag Manager track every action you take. You’re presented with special offers and coupons based on your viewing and buying patterns. If you find something you want for your birthday, a third party manages your wish list, which you can share through multiple social- media outlets or email to a friend. When you select something to buy, you find yourself presented with similar items as kind suggestions. And when you finally check out, you’re offered the ability to pay with promo codes, gifts cards, PayPal or a variety of credit cards.Get the Guide