Mainstream Parallel Programming
Both of the popular MPI distributions—LAM and MPICH—include wrapper scripts to allow users to compile their programs easily with the required MPI libraries. These wrapper scripts allow you to pass parameters to GCC like you always do:
mpicc: for C programs
mpi++: for C++ programs
mpif77: for FORTRAN 77 programs
Use mpirun to execute your newly compiled program. For example, I compiled my code with the command mpicc -O3 -o parallel parallel.c and then executed it with mpirun n0 ./parallel. The n0 signifies that the program is to run on node 0 only. To run it on additional nodes, you can specify a range like n0-7 (for eight processors) or use mpirun C to signify that the program is to run on all available nodes.
So, with only a few simple MPI calls, we have parallelized our image filter algorithm very easily, but did we gain any performance? There are a few ways that we can gain performance. The first is in terms of speed, and the second is in terms of how much work we can do. For example, on a single computer, a 16,000 x 16,000 pixel image would require an array of 768,000,000 elements! This is just too much for many computers—GCC complained to me that the array was simply too big! By breaking the image down as we did above, we can ease memory requirements for our application.
I tested the code above on a 16-node Beowulf cluster running Fedora Core 1. Each node had 1.0GB of RAM, a 3.06GHz Pentium 4 processor and was connected to the other nodes through Gigabit Ethernet. The nodes also shared a common filesystem through NFS. Figure 4 shows the amount of time required to read in the image, process it and write it back to disk.
From Figure 4, we can see that parallelizing this image filter sped things up for even moderately sized images, but the real performance gains happened for the largest images. Additionally, for images of more than 10,000 x 10,000 pixels, at least four nodes were required due to memory constraints. This figure also shows where it is a good idea to parallelize the code and where it was not. In particular, there was hardly any difference in the program's performance from 1,600 x 1,600 pixel images to about 3,200 x 3,200 pixel images. In this region, the images are so small that there is also no benefit in parallelizing the code from a memory standpoint, either.
To put some numbers to the performance of our image-processing program, one 3.06GHz machine takes about 50 seconds to read, process and write a 6,400 x 6,400 image to disk, whereas 16 nodes working together perform this task in about ten seconds. Even at 16,000 x 16,000 pixels, 16 nodes working together can process an image faster than one machine processing an image 6.25 times smaller.
This article demonstrates only one possible way to take advantage of the high performance of Beowulf clusters, but the same concepts are used in virtually all parallel programs. Typically, each node reads in a fraction of the data, performs some operation on it, and either sends it back to the master node or writes it out to disk. Here are four examples of areas that I think are prime candidates for parallelization:
Image filters: we saw above how parallel processing can tremendously speed up image processing and also can give users the ability to process huge images. A set of plugins for applications such as The GIMP that take advantage of clustering could be very useful.
Audio processing: applying an effect to an audio file also can take a large amount of time. Open-source projects such as Audacity also stand to benefit from the development of parallel plugins.
Database operations: tasks that require processing of large amounts of records potentially could benefit from parallel processing by having each node build a query that returns only a portion of the entire set needed. Each node then processes the records as needed.
System security: system administrators can see just how secure their users' passwords are. Try a brute-force decoding of /etc/shadow using a Beowulf by dividing up the range of the checks across several machines. This will save you time and give you peace of mind (hopefully) that your system is secure.
- High-Availability Storage with HA-LVM
- DNSMasq, the Pint-Sized Super Dæmon!
- Localhost DNS Cache
- Real-Time Rogue Wireless Access Point Detection with the Raspberry Pi
- Days Between Dates: the Counting
- Linux for Astronomers
- You're the Boss with UBOS
- The Usability of GNOME
- Multitenant Sites
- PostgreSQL, the NoSQL Database