Parallel Programming with NVIDIA CUDA
Programmers have been interested in leveraging the highly parallel processing power of video cards to speed up applications that are not graphic in nature for a long time. Here, I explain how to do this with the CUDA API from NVIDIA. If your GPU is not from NVIDIA, you are not out of luck, as the same can be achieved with other APIs, such as the ATI-based Stream SDK or OpenCL.
With GPGPU, general-purpose applications are executed directly on the streaming processors of video cards. Under the stream processing paradigm, a data set is named a stream. You can think of it much like “file streams” provided by an OS's pipe function.
Streams can be any isolated piece of data, such as a stream of business events or a set of scientific data. Parallel operations are applied on streams with operators, such as split, compute or merge. Figure 1 shows several streams of data and compute operators in parallel.
Stream processing has been used successfully for general programming, including dataflow programming, financial calculation and industrial automation, just to name a few. Furthermore, system engineers and vendors such as Dell, ASUS, Western Scientific and Microway are building clusters of video cards that are similar to supercomputers, and they're available at a fraction of the cost of their CPU-based counterparts.
You can find many examples of real-life applications that were sped up using CUDA acceleration showcased by NVIDIA at www.nvidia.com/cuda.
Now that I've brushed upon what CUDA and stream processing are, let's start looking into a couple compute-intensive algorithms you can use to give it a spin.
Vector fields are constructs employed in a variety of professions. In robotics, vector fields can help a mobile robot navigate through a room. Let's define a destination and add one or more obstacles. A good scenario for testing CUDA consists in calculating a series of vectors that indicate the direction a robot should follow in order to reach its destination while avoiding all the obstacles present. The robot should also avoid local minima (see below). Figure 2 shows the robot and vector field (the green arrows are the “vectors”).
I refer to the target point as an attractor and to obstacles as repulsors—the arrows point toward the attractor and away from repulsors (Figure 3). So, how do you calculate the vector field? The vector field is composed of a series of individual fields, one for each attractor and repulsor.
Each individual field is calculated by computing the direction toward the attractor and away from repulsors at each point in the room. Once all of the vectors have been calculated, you obtain the complete vector field by adding them up.
For this example, I will have three streams and two compute operators. The list of attractors and repulsors will be used as the input stream. Then, a compute operator will be applied to it to obtain a second stream: the vector field. Finally, a second compute operator will provide another stream: the local minima field.
Why is this a good demonstration of CUDA? When deciding whether an algorithm is a good candidate for parallelization, you should consider the following criteria:
Is the problem compute-intensive?
Can the problem be modeled as a stream process?
Is the code independent of any shared resources?
What sequences of code are independent of any other code?
Can the data be represented as arrays of 32-bit objects?
Are there no optimizations of the sequential algorithm possible?
In my case, the vector field may be large and could take a long time when evaluating the whole field. The path a robot should follow can be modeled easily with streams. There is no access to shared resources, and the computation of each element in the field is independent from all the others.
In terms of computation, robotic engineers usually constrain their algorithms to calculate only the part of the vector field that is needed at a given time, never evaluating the entire vector field. Next, I show you how you can use stream processing for calculating the whole vector field in real time. Let's get started.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
| Non-Linux FOSS: libnotify, OS X Style | Jun 18, 2013 |
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
| Introduction to MapReduce with Hadoop on Linux | Jun 05, 2013 |
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Linux Systems Administrator
- Validate an E-Mail Address with PHP, the Right Way
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Senior Perl Developer
- Technical Support Rep
- RSS Feeds
- Introduction to MapReduce with Hadoop on Linux
- UX Designer
- Bought photoshop CS5 for developing a website :(
39 min 4 sec ago - What the author describes
2 hours 5 min ago - Reply to comment | Linux Journal
6 hours 15 min ago - Reply to comment | Linux Journal
7 hours 46 sec ago - Didn't read
7 hours 11 min ago - Reply to comment | Linux Journal
7 hours 16 min ago - Poul-Henning Kamp: welcome to
9 hours 26 min ago - This has already been done
9 hours 27 min ago - Reply to comment | Linux Journal
10 hours 12 min ago - Welcome to 1998
11 hours 55 sec ago









Comments
The statement minima[y][x] =
The statement
minima[y][x] = (norm(field[y][x]) < threshold) ? true : false
may incur branching penalty
You can just use the first part
minima[y][x] = (norm(field[y][x]) < threshold)