Effectively Utilizing 3DNow! in Linux
Some applications are especially well-suited for 3DNow!, such as graphics rendering. Optimizing applications with a tiny amount of isolated floating-point operations may not be worth the effort, due to the extra time associated with coding in assembly language. There are several criteria to look at before deciding whether to use 3DNow! with the K6-2 or K6-3. First, the application should have at least a few single-precision floating-point computations grouped in one part of the program, as there is some overhead involved in switching into MMX/3DNow! mode. While in MMX/3DNow! mode, standard floating-point operations that use the regular FPU are not possible. Standard integer operations are fine while in MMX/3DNow! mode. The new K7 reportedly won't have this overhead. The MMX mode switch can also break up internal instruction pipelining, which could add overhead. The best way to minimize the impact of the overhead is to use 3DNow! in units that contain several single-precision floating-point operations. Performance will also be improved if the floating-point data is organized in a successive, regular format (such as arrays of floating-point numbers) that enables a series of 3DNow! operations to be performed in sequence.
One application that can be efficiently implemented using 3DNow! is an image gradient calculation (edge detection), especially range image gradients. To illustrate how 3DNow! can enable efficient operations on Linux, we'll be looking at how the gradient calculation can be optimized for range and volume data. In an image collected on a 2-D grid, gradient is a measure of the local change of pixel values (e.g., pixel intensity) at a particular point; in a 3-D volume, gradient measures the difference of intensity between a voxel (volumetric element) and its neighbors. There are a variety of methods for determining the gradient. For an image collected on a grid, one way to compute the gradient is as the directional differences in value of the four immediate neighbors (to the north/south and east/west) of each pixel. When this difference is computed for the entire image, the result is that points that lie on the border between regions usually have strong gradient magnitudes. The gradient magnitudes can be viewed as an image—they look like a collection of region boundaries of the original image. The image in Figure 1 is a range image produced from laser-range data. In a range image, each pixel is a floating-point value which expresses the distance from the viewing plane of the corresponding point on the imaged object. We've displayed the image using intensities where brighter values indicate closer points. Figure 2 shows the computed four-direction gradient for the range image of Figure 1.
In general, the gradient magnitude at a particular point is given by the equation
where delta(x) is equal to the change in pixel intensity in the x direction and delta(y) is the change in the y direction. We've illustrated the gradient for pixel P1 in Figure 3. P1's gradient magnitude is:
3DNow! is well-suited to perform this computation for range images. Since range images can be stored as an array, points that lie next to each other on a row will appear consecutively in memory. MMX has an instruction, movq, that moves a “quadword” (four words—two single-precision floats) from memory into a multimedia register. This means consecutive image pixels P4 and P5 can be loaded into a multimedia register with one move. If P6 and P7 are loaded into another MMX register, we can use the 3DNow! operation PFSUB to subtract the contents of pairs of registers. The result of one 3DNow! subtraction will be the delta(y) for both P1 and P2. One more subtraction can yield delta(x) for these pixels. Additionally, 3DNow! operations can be used to square delta(x) and delta(y), to add them together, to apply the multiplicative factor and take the square root. The whole process can be implemented using fewer assembly instructions (and about half the execution time) than would be required for implementations using standard floating-point instructions.
We developed the assembly language function XYGRAD to assist in calculating range image gradients. (The code for this function can be found in the archive file ftp.linuxjournal.com/pub/lj/listings/issue68/3685.tgz.) The function processes a single row of image pixels at a time using 3DNow!. XYGRAD can be called from any C program using the prototype shown in the code. After assembling XYGRAD with NASM, gcc is used to link it with a C program that utilizes XYGRAD.
|Designing Electronics with Linux||May 22, 2013|
|Dynamic DNS—an Object Lesson in Problem Solving||May 21, 2013|
|Using Salt Stack and Vagrant for Drupal Development||May 20, 2013|
|Making Linux and Android Get Along (It's Not as Hard as It Sounds)||May 16, 2013|
|Drupal Is a Framework: Why Everyone Needs to Understand This||May 15, 2013|
|Home, My Backup Data Center||May 13, 2013|
- Linux Systems Administrator
- New Products
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Have you tried Boxen? It's a
5 hours 40 min ago
- seo services in india
10 hours 12 min ago
- For KDE install kio-mtp
10 hours 13 min ago
- Evernote is much more...
12 hours 13 min ago
- Reply to comment | Linux Journal
20 hours 58 min ago
- Dynamic DNS
21 hours 32 min ago
- Reply to comment | Linux Journal
22 hours 31 min ago
- Reply to comment | Linux Journal
23 hours 21 min ago
- Not free anymore
1 day 3 hours ago
1 day 7 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi
It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?