Mainstream Parallel Programming

Whether you're a scientist, graphic artist, musician or movie executive, you can benefit from the speed and price of today's high-performance Beowulf clusters.
Breaking Up the Image

Figure 1. The process of breaking up a problem into smaller chunks is called domain decomposition, and it usually is done as shown here.

Applying a filter to an image can take minutes or even hours to complete, depending on the complexity of the filter, the size of the image and the speed of the computer. To fix these problems, we need to get smaller chunks of the image to several computers to help speed up the task. Figure 1 shows common ways of doing this—we'll cut our image into strips. If we have one large image file, we can do this in C/C++, like this:

FILE *imageFile = fopen("image_in.ppm", "rb");

// Safety check.
if (imageFile != NULL) {

   // Read in the header.
   fread(imageHeader, sizeof(char),
                    HEADER_LENGTH, imageFile);

   // fseek puts us at the point in the image
   // that this node will process.
   fseek(imageFile, lowerBoundY*WIDTH*3,

   // Here is where we read in the colors:
   // i is the current row in the image.
   // j is the current column in the image.
   // k is the color, 0 for red, 1 for blue,
   //    and 2 for green.
   for (i=0; i<boxSize+1; i++) {
      for (j=0; j<WIDTH; j++) {
         for(k=0; k<3; k++) {
           fread(&byte, 1, 1, imageFile);
            pixelIndex = i*WIDTH+j+k;
            origImage[pixelIndex] = byte;

Applying the Filter

Now that we have each node storing a piece of the image for processing, we need to apply the filter to the image. The GIMP Documentation Team has done a good job of describing how to do this using a convolution matrix in the GIMP Documentation. Many image effects—such as sharpen, blur, Gaussian blur, edge detect and edge enhance—have unique matrices that provide the desired effect. The convolution matrix works by studying each pixel of an image and changing its value based on the values of neighboring pixels. We consider the edge detect matrix in this article, shown in Figure 2.

Figure 2. This matrix represents the edge detect filter. The red square represents the pixel to be considered, and the other numbers represent the neighboring pixels.

When we apply this filter to the image, we multiply each pixel by –4 and add the values of the pixels above, below and to the left and right to that. This becomes the new value of the pixel. Because there are zeros in the corners of the matrix, we can simplify our program and get better performance by skipping those calculations. Below I've shown how this is done in practice. In Figure 3, I show what the filter does to our image of Tux.

for (i=0; i<boxSize; i++) {
   for (j=0; j<WIDTH; j++) {
      if (i>0 && i<(HEIGHT-1) &&
          j>0 && j<(WIDTH-1)){

        // Now we apply the filter matrix

        // First to the current pixel.
        pixelIndex = i*WIDTH + j;
        r = origImage[pixelIndex];
        g = origImage[pixelIndex+1];
        b = origImage[pixelIndex+2];
        filter_r = -4*r;
        filter_g = -4*g;
        filter_b = -4*b;

        // Next to the left neighbor.
        pixelIndex = i*WIDTH + j - 1;
        r = origImage[pixelIndex];
        g = origImage[pixelIndex+1];
        b = origImage[pixelIndex+2];
        filter_r += 1*r;
        filter_g += 1*g;
        filter_b += 1*b;

        // Next to the right neighbor.
        pixelIndex = i*WIDTH + j + 1;
        r = origImage[pixelIndex];
        g = origImage[pixelIndex+1];
        b = origImage[pixelIndex+2];
        filter_r += 1*r;
        filter_g += 1*g;
        filter_b += 1*b;

        // The neighbor above.
        pixelIndex = (i-1)*WIDTH + j;
        r = origImage[pixelIndex];
        g = origImage[pixelIndex+1];
        b = origImage[pixelIndex+2];
        filter_r += 1*r;
        filter_g += 1*g;
        filter_b += 1*b;

        // The neighbor below.
        pixelIndex = (i+1)*WIDTH + j;
        r = origImage[pixelIndex];
        g = origImage[pixelIndex+1];
        b = origImage[pixelIndex+2];
        filter_r += 1*r;
        filter_g += 1*g;
        filter_b += 1*b;

      // Record the new pixel.
      pixelIndex = i*WIDTH + j;
      filterImage[pixelIndex]   = filter_r;
      filterImage[pixelIndex+1] = filter_g;
      filterImage[pixelIndex+2] = filter_b;

We can mimic the readImage() subroutine to create a writeImage() subroutine to write the image to disk in chunks.

Figure 3. On the left is our original image of Tux, and on the right is the image after we apply the edge detect filter.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

about execution

rajesh k r's picture

can anyone tell me how to run the above code on cluster or on LAN.if anyone has code send me at

image in c

Anonymous's picture

can any1 tel me how to execute above in C

and pls help us for how to put processed image in another file i.e. gray scale image

interesting concept... is

bharath's picture

interesting concept... is there any way u can send the source code, i'd like to use it for my class project...

cristal clear concept necessary for a starter.... :)

Chandra's picture

This article has helped me to explore the extream ends of parallel
computing and MPI.During my study and running codes on cluster PADMA
at CDAC India I always had a dim picture of MPI application till I read this article.
Nice and sincere approch..

Image processing program

Vayrix's picture

Has someone filled up the gaps missing in the code? I'm not an expert in C/C++ programming, but I would like to see and run whole code on ROCKS cluster. So I would appreciate if someone gave me complete program code, if possible.

test Program

BryanC's picture

Any chance you still have this full test program source in C++?

This article has inspired me to set up an HPC cluster with a few fellow students, and I think a graphics algorithm such as this would be exactly what we need to awe our audience (possibly project benefactors, school deans, etc).

me too

Anonymous's picture

I could really use this source code for a class of mine. Let me know if you get it. -Larry Prokov

Good info

Michael Locker MD's picture

Very informative. Thanks.

Michael Locker MD

MOSIX and openMOSIX have

Anonymous's picture

MOSIX and openMOSIX have been getting a lot of attention lately, but they are used primarily for programs that are not specifically written to run on clusters, and they work by distributing threads in multithreaded programs to other nodes.

This is an inaccurate statment of how openMosix and I am pretty sure MOSIX work. They do not migrated threads of a multi threaded application but migrate the whole process. In fact applications that used shared-memory such as multi-threaded applications, like apache, require a secodn pacth on top of openMosix to use the migration system. Still for making fast cluster for multi-stage proccess such as video conversion or manipulation it is very effective at handling the workload.

Is there a way to speed up cpu-intensive application ?

BO Yang's picture

Good article I think .
Just few days ago , I have taken part in the ICFP contest and the task is to write a virtual machine . And I am disappointed that I didn't complete the test because of my slow application which will occupy all my cpu whenever start up . Does parallel programing help in this situation ? I think it will do nothing good with the such kind of cpu-intensive application , any idea ?

CPU Intensive Applications

M. Hore's picture

In my experience, cpu-intensive jobs are easily parallelized and benefit from it whenever their tasks are easily broken up into pieces that are more or less independent of each other (though they need not be). I don't immediately see how one might parallelize a virtual machine, and don't see how it might be beneficial, but I could be wrong.

What do you find to be the major bottleneck in your virtual machine?