Parallel Programming with NVIDIA CUDA
If parallelization of your algorithm is possible, using CUDA will speed up your computations dramatically, allowing you to make the most out of your hardware.
The main challenge consists in deciding how to partition your problem into chunks suitable for parallel execution. As with so many other aspects in parallel programming, this is where experience and—why not—imagination come into play.
Additional techniques offer room for even more improvement. In particular, the on-chip shared memory of each compute node allows further speedup of the computation process.
Resources
Hardware-Accelerated Computation: gpgpu.org
Source Code and Video: www.coroware.com/streamprocessing
Getting Started on CUDA: www.nvidia.com/cuda
ATI-Based Stream SDK: developer.ati.com/gpu/atistreamsdk
OpenCL Official Web Site: www.khronos.org/opencl
Alejandro Segovia is a parallel programming advisor for CoroWare. He is also a contributing partner at RealityFrontier. He works in 3-D graphic development and GPU acceleration. Alejandro was recently a visiting scientist at the University of Delaware where he investigated CUDA from an academic standpoint. His findings were published at the IEEE IPCCC Conference in 2009.
- « first
- ‹ previous
- 1
- 2
- 3
- 4
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
| Non-Linux FOSS: libnotify, OS X Style | Jun 18, 2013 |
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
| Introduction to MapReduce with Hadoop on Linux | Jun 05, 2013 |
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Linux Systems Administrator
- Validate an E-Mail Address with PHP, the Right Way
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Introduction to MapReduce with Hadoop on Linux
- RSS Feeds
- Bought photoshop CS5 for developing a website :(
2 hours 6 min ago - What the author describes
3 hours 32 min ago - Reply to comment | Linux Journal
7 hours 42 min ago - Reply to comment | Linux Journal
8 hours 27 min ago - Didn't read
8 hours 38 min ago - Reply to comment | Linux Journal
8 hours 43 min ago - Poul-Henning Kamp: welcome to
10 hours 53 min ago - This has already been done
10 hours 54 min ago - Reply to comment | Linux Journal
11 hours 39 min ago - Welcome to 1998
12 hours 27 min ago






Comments
The statement minima[y][x] =
The statement
minima[y][x] = (norm(field[y][x]) < threshold) ? true : false
may incur branching penalty
You can just use the first part
minima[y][x] = (norm(field[y][x]) < threshold)