An Overview of Intel's MMX Technology
Let's suppose we have a gray-scale bitmap image, like the one in Figure 3. Each pixel is stored in one unsigned 8-bit byte contained in an array. Smaller numbers represent darker tones of gray, while larger numbers represent brighter tones. Numbers 0 and 255 represent the pure black and white colors, respectively. For the sake of code simplicity, the images employed in this program (see Listing 2 in the archive file) use Microsoft Windows' gray-scale BMP file format. John Bradley's xv utility can easily be used under Linux to create and display this kind of bitmap image.
To make the image brighter, we just need to add a positive integer (let's say 64 hexadecimal) to each of its pixels. In C, we would have something like this:
#define BRIGHTENING_CONSTANT 0x64 unsigned char bitmap[BITMAP_SIZE]; size_t i; /* Load image somehow ... */ for(i = 0; i < BITMAP_SIZE; i ++) bitmap[i] += BRIGHTENING_CONSTANT;

Figure 4. Brightened Image Using Wraparound Arithmetic
Unfortunately, we end up with the undesired image found in Figure 4. This happens because of wraparound; if the result of the addition overflows (i.e., exceeds 255, which is the upper unsigned 8-bit byte limit), the result is truncated so that only the lower (least significant) bits are considered. For example, adding 100 (64 hexadecimal) to a pixel value of 250 (almost pure white) gives the result shown below.
250 decimal 11111010 binary
+ 100 decimal + 01100100 binary
------------- ------------------
= 350 decimal = 101011110 binary Overflow
produced
= 94 decimal = 01011110 binary Take the 8
least significant bits
The result is 94 which produces a darker gray instead of a brighter one, causing the observable inversion effect.
What we require is that whenever an addition exceeds the maximum limit, the result should saturate (clipped to a predefined data-range limit). In this case, the saturation value is 255, which represents pure white. The following C fragment takes care of saturation:
int sum;
for(i = 0; i < BITMAP_SIZE; i ++)
{
sum = bitmap[i] + BRIGHTENING_CONSTANT;
/* UCHAR_MAX is defined in <limits.h>
* and is equal to 255u */
if(sum > UCHAR_MAX)
bitmap[i] = UCHAR_MAX;
else
bitmap[i] = (unsigned char) sum;
}
Now we obtain the image shown in Figure 5, which is brightened as we wanted.

Figure 5. Brightened Image Using Saturation Arithmetic

Figure 6. Unsigned Byte-Packed Addition with Saturation
MMX technology allows us to do this saturated arithmetic addition on eight unsigned bytes in parallel using just one instruction: paddusb. Figure 6 shows an example of how this instruction works. Our image-brightening algorithm (see Listing 1, starting at line 61) can be described as follows:
Pack the same brightening constant byte eight times into the MM0 register (line 66).
Repeat bitmap-size / 8 times:
Copy the next eight bytes from the bitmap array into the MM1 register (line 74).
Add the eight packed unsigned bytes contained in MM0 to the eight packed unsigned bytes in MM1. Use saturation (line 75).
Copy the result of the MM1 register back to the bitmap array from where it was originally taken (line 76).
Advance bitmap array index register (line 77).
The movq MMX instruction used in steps 1 and 3 copies 64 bits from the source operand to the destination operand.
Whenever we finish executing MMX instructions, the emms instruction (Listing 1, line 81) should be used to clear the MMX state. This is an important issue, especially if any floating-point instructions follow in our program. In order to make the MMX technology compatible with existing operating systems and applications, Intel engineers decided the MMX registers should share the same physical space with the floating-point registers. This was considered necessary because, for example, in a multi-tasking operating system such as Linux, whenever a task switch occurs, the running process must have its state preserved in order to be resumed some time in the future. This state preservation involves copying all of the CPU's registers into memory. If you add more registers to the CPU, you must also modify the operating system code that takes care of saving the registers. However, if your new registers are aliased to existing registers, no change is required in the code.
Unfortunately, this workaround in the case of MMX and floating-point registers has a major drawback: you cannot use both types of registers at the same time, simply because they represent two very different types of data. The general rule is you cannot mix MMX and floating-point instructions in the same portions of code. Therefore, the emms instruction is the mechanism of informing the CPU that future floating-point instructions are allowed in the program.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
| Speed Up Your Web Site with Varnish | Jun 19, 2013 |
| Non-Linux FOSS: libnotify, OS X Style | Jun 18, 2013 |
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
- Speed Up Your Web Site with Varnish
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Linux Systems Administrator
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Android's Limits
- Web & UI Developer (JavaScript & j Query)
- Yeah, user namespaces are
1 hour 3 min ago - Cari Uang
4 hours 34 min ago - user namespaces
7 hours 28 min ago - yea
7 hours 54 min ago - One advantage with VMs
10 hours 22 min ago - about info
10 hours 55 min ago - info
10 hours 56 min ago - info
10 hours 57 min ago - info
10 hours 59 min ago - info
11 hours 55 sec ago
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




Comments
Listing 2??
Where is the listing 2 man>>>????
On the FTP server
The term "archive file" is referring to the article resources file: 3244.tgz on our ftp server.
Mitch Frazier is an Associate Editor for Linux Journal.
mmx tecknology
what is the mmx tecknology.what is the use.
MMX help
Hello
I work with nasm assembler & bochs as a pc emulator to enter to pmode and in this mode overlaying a picture on a background. i could enter to protected mode but i dont know how to load my image.and what is the use of FAT12.i wrote my program in assembly. could you help me.Any help would be greatly appreciated.
Thanks
F.M
intel mmx technology
it is a multimedia communication system