An Overview of Intel's MMX Technology

An introduction to MMX and how to take advantage of its capabilities in your program.
An Image Brightening Program

Let's suppose we have a gray-scale bitmap image, like the one in Figure 3. Each pixel is stored in one unsigned 8-bit byte contained in an array. Smaller numbers represent darker tones of gray, while larger numbers represent brighter tones. Numbers 0 and 255 represent the pure black and white colors, respectively. For the sake of code simplicity, the images employed in this program (see Listing 2 in the archive file) use Microsoft Windows' gray-scale BMP file format. John Bradley's xv utility can easily be used under Linux to create and display this kind of bitmap image.

To make the image brighter, we just need to add a positive integer (let's say 64 hexadecimal) to each of its pixels. In C, we would have something like this:

unsigned char bitmap[BITMAP_SIZE];
size_t i;
/* Load image somehow ... */
for(i = 0; i < BITMAP_SIZE; i ++)

Figure 4. Brightened Image Using Wraparound Arithmetic

Unfortunately, we end up with the undesired image found in Figure 4. This happens because of wraparound; if the result of the addition overflows (i.e., exceeds 255, which is the upper unsigned 8-bit byte limit), the result is truncated so that only the lower (least significant) bits are considered. For example, adding 100 (64 hexadecimal) to a pixel value of 250 (almost pure white) gives the result shown below.

    250 decimal       11111010 binary
  + 100 decimal    +  01100100 binary
  -------------    ------------------
  = 350 decimal    = 101011110 binary   Overflow
  =  94 decimal    =  01011110 binary   Take the 8
                            least significant bits

The result is 94 which produces a darker gray instead of a brighter one, causing the observable inversion effect.

What we require is that whenever an addition exceeds the maximum limit, the result should saturate (clipped to a predefined data-range limit). In this case, the saturation value is 255, which represents pure white. The following C fragment takes care of saturation:

int sum;
for(i = 0; i < BITMAP_SIZE; i ++)
  sum = bitmap[i] + BRIGHTENING_CONSTANT;
  /* UCHAR_MAX is defined in <limits.h>
   * and is equal to 255u */
  if(sum > UCHAR_MAX)
    bitmap[i] = UCHAR_MAX;
    bitmap[i] = (unsigned char) sum;

Now we obtain the image shown in Figure 5, which is brightened as we wanted.

Figure 5. Brightened Image Using Saturation Arithmetic

Figure 6. Unsigned Byte-Packed Addition with Saturation

MMX technology allows us to do this saturated arithmetic addition on eight unsigned bytes in parallel using just one instruction: paddusb. Figure 6 shows an example of how this instruction works. Our image-brightening algorithm (see Listing 1, starting at line 61) can be described as follows:

  • Pack the same brightening constant byte eight times into the MM0 register (line 66).

  • Repeat bitmap-size / 8 times:

    1. Copy the next eight bytes from the bitmap array into the MM1 register (line 74).

    2. Add the eight packed unsigned bytes contained in MM0 to the eight packed unsigned bytes in MM1. Use saturation (line 75).

    3. Copy the result of the MM1 register back to the bitmap array from where it was originally taken (line 76).

    4. Advance bitmap array index register (line 77).

The movq MMX instruction used in steps 1 and 3 copies 64 bits from the source operand to the destination operand.

Whenever we finish executing MMX instructions, the emms instruction (Listing 1, line 81) should be used to clear the MMX state. This is an important issue, especially if any floating-point instructions follow in our program. In order to make the MMX technology compatible with existing operating systems and applications, Intel engineers decided the MMX registers should share the same physical space with the floating-point registers. This was considered necessary because, for example, in a multi-tasking operating system such as Linux, whenever a task switch occurs, the running process must have its state preserved in order to be resumed some time in the future. This state preservation involves copying all of the CPU's registers into memory. If you add more registers to the CPU, you must also modify the operating system code that takes care of saving the registers. However, if your new registers are aliased to existing registers, no change is required in the code.

Unfortunately, this workaround in the case of MMX and floating-point registers has a major drawback: you cannot use both types of registers at the same time, simply because they represent two very different types of data. The general rule is you cannot mix MMX and floating-point instructions in the same portions of code. Therefore, the emms instruction is the mechanism of informing the CPU that future floating-point instructions are allowed in the program.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Listing 2??

Anonymous's picture

Where is the listing 2 man>>>????

On the FTP server

Mitch Frazier's picture

The term "archive file" is referring to the article resources file: 3244.tgz on our ftp server.

Mitch Frazier is an Associate Editor for Linux Journal.

mmx tecknology

vishnuprasad.k.s's picture

what is the mmx tecknology.what is the use.

MMX help

FM's picture

I work with nasm assembler & bochs as a pc emulator to enter to pmode and in this mode overlaying a picture on a background. i could enter to protected mode but i dont know how to load my image.and what is the use of FAT12.i wrote my program in assembly. could you help me.Any help would be greatly appreciated.

intel mmx technology

D.BHARAT KUMAR's picture

it is a multimedia communication system