Effectively Utilizing 3DNow! in Linux
In 1998, AMD (Advanced Micro Devices) released a new family of x86 CPUs that included 3DNow! capability. 3DNow! is designed to deliver enhanced performance for certain multimedia and floating-point operations. Other x86 clone CPU manufacturers, such as Cyrix and IDT (Integrated Device Technology, Inc.), also initially pledged to support 3DNow! in forthcoming CPUs. Currently, 3DNow! support is provided by IDT's most recent generation of processors (WinChip 2) as well as by AMD's K6-2, K6-3 and Athlon (K7) families of processors.
In this article, we'll describe the 3DNow! technology (especially how it impacts performance on the popular K6-2 and K6-3 CPUs) and show how to detect and take advantage of 3DNow! using Linux. 3DNow! is an exciting development; using it effectively can unleash outstanding performance by AMD and IDT processors.
3DNow! builds on the Intel MMX (multimedia extensions to x86) capability. Ariel Ortiz Ramirez described MMX and how to utilize it with Linux in issue 61 of Linux Journal, so we won't go into much detail here about MMX. Briefly stated, MMX adds eight 64-bit “multimedia” registers (MM0 through MM7), and 57 instructions that operate on those registers, to the x86 platform. Multiple short integers can be stored (packed) into each multimedia register, and the MMX instructions allow parallel computations on these packed integers. While MMX is restricted to operation on integers, 3DNow! extends the multimedia registers by enabling multiple (two) single-precision floating-point numbers to be stored (packed) into each of them. The 3DNow! instruction set includes 21 new operations on the multimedia registers. The majority of these instructions provide fast, pipelined single-precision (packed) floating-point computation.
3DNow! capability is well-suited for fast calculation of common graphics operations such as clipping, lighting and 3-D transformations, as well as special effects involving application of physical models (e.g., fog, cloud and gravity effects). However, any application with a fair amount of floating-point computations can benefit from use of 3DNow! When used effectively, 3DNow! can increase the floating-point throughput of an application by a factor of two to four (or even more for some special-purpose applications). The increased performance results because each 3DNow! operation produces two outcomes (packed into each multimedia register), whereas standard floating-point operations by the floating-point unit (FPU) produce only one outcome per operation.
Furthermore, in the AMD K6-2 and K6-3, the MMX and 3DNow! operations have access to dual pipelined execution units, enabling up to two 3DNow! operations to execute simultaneously. Thus, up to four results can be computed per processor clock cycle on the K6-2 and K6-3. (This compares to a maximum of one floating-point result per clock cycle for the Pentium II; thus, a PII/450 has a peak performance of 450 MFLOPS (million floating-point operations per second) while a K6-2/450 has a peak performance of 1800 MFLOPS). The standard floating-point computations on the AMD K6-2 and K6-3 are not pipelined, which means there is a delay of two or more clock cycles between each concluded standard floating-point computation. Using the 3DNow! capability can turbo-charge the floating-point throughput of programs that utilize 3DNow! instructions. For computers equipped with an AMD K6-2, K6-3 or IDT WinChip2, peak floating-point performance is possible only for programs that contain 3DNow! instructions.
Unfortunately, few compilers can generate 3DNow! instructions for compiled code. Thus, to exercise the 3DNow! capability in programs written in high-level languages such as C/C++, FORTRAN or Pascal, it's necessary to include explicit assembly code which has 3DNow! operations. This is not difficult to do, so we will demonstrate how to use 3DNow! in C/C++ programs in Linux.
One way to determine whether a given machine supports 3DNow! is to download and run an application that identifies the processor and checks for 3DNow! capability. AMD has an application of this type that can be downloaded from their corporate web site. A practical solution for determining from within a program whether the host CPU supports 3DNow! is to use the CPUID instruction, which returns information on processor features and is supported by the entire x86 family. If a program determines that 3DNow! support is present, it can exercise the appropriate sections of code which utilize 3DNow! Specifically, 3DNow! support can be determined by calling the instruction CPUID 8000_0001h. This instruction sets flag bits in the EDX register according to the CPU's level of multimedia support. Bit 31 of the EDX register indicates whether there is 3DNow! support; thus, CPUID sets this bit to 1 if the CPU supports 3DNow! If bit 30 is also set to 1, the CPU supports the enhanced extensions to 3DNow! available in the new AMD Athlon processor.
Some assemblers include support for 3DNow! instructions; assembly language modules that include 3DNow! instructions will be assembled without difficulty by such assemblers. However, many assemblers do not include direct support for 3DNow! In many cases, it is still possible to use 3DNow! instructions with those assemblers, although it will be necessary to define the instructions as pseudo-instructions using data blocks or emits. Fortunately, AMD's web site has a C++ header file that contains macro definitions for the 3DNow! instruction set. Inclusion of this header file can enable development of embedded assembly code within higher-level language programs. These macros specify the hexadecimal decoding for the 3DNow! instructions using the emit pseudo-instruction; the header file may need to be modified for certain compilers, as not all of them support emit. Under Linux, we used the freely available Netwide Assembler (NASM) to assemble code. NASM allows pseudo-instruction macros to be built using the db command. We have created a header file that defines the 3DNow! instructions using the db commands. This header file is available for download from http://merlin.cs.uah.edu/visgig/threednow/. However, NASM versions from 0.98 and beyond support 3DNow!, so the header file is needed only with older versions. Incidentally, we found that NASM 0.97 doesn't allow MM2, MM3, MM6, or MM7 to be result registers for 3DNow! operations. NASM 0.98 has no such problem.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
| Non-Linux FOSS: libnotify, OS X Style | Jun 18, 2013 |
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
| Introduction to MapReduce with Hadoop on Linux | Jun 05, 2013 |
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Linux Systems Administrator
- Validate an E-Mail Address with PHP, the Right Way
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- RSS Feeds
- Senior Perl Developer
- Technical Support Rep
- Introduction to MapReduce with Hadoop on Linux
- Weechat, Irssi's Little Brother
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




54 min 53 sec ago
5 hours 5 min ago
5 hours 50 min ago
6 hours 54 sec ago
6 hours 5 min ago
8 hours 15 min ago
8 hours 16 min ago
9 hours 2 min ago
9 hours 50 min ago
10 hours 14 min ago