Roman's Law and Fast Processing with Multiple CPU Cores

Some practical methods to exploit multiple cores and find thread synchronization problems.
A Few Loose Ends

OpenMP and -xautopar seem to work pretty well for C, but what about C++? Will they mesh well with the kind of modern C++ usage peppered with generics and template metaprogramming? The short answer is, there's no short answer. But, let's see for ourselves with the following example of modern C++ [ab]use:

#include <vector> 
#include <iterator> 
#include <algorithm> 
#include <iostream> 
void standard_input_sorter() {
   using namespace std; 
   vector<string> v;
   copy(istream_iterator<string>(cin), istream_iterator<string>(),
   sort(v.begin(), v.end());

$ CC -c -fast -xloopinfo -xautopar -xopenmp -library=stlport4

The above produces a pretty long list of complaints, explaining why a particular section of the STLport library cannot be parallelized. The key issue here is that certain areas of C++ are notoriously difficult to parallelize by default. Even with OpenMP, things like concurrent container access are much more trouble than they are worth. Do we have to rewrite STL? Well, seems like Intel almost did. Intel has been working on what it calls the Thread Building Blocks (TBB) C++ library, and its claim to fame is exactly that—making modern C++ parallel. Give it a try, and see if it works for you. I especially recommend it if you're interested in exploiting task parallelism. But, then again, the amount of modern C++ that TBB throws at even the simplest of examples, such as calculating Fibonacci numbers, is something that really makes me sad. Tasks as defined in the upcoming OpenMP 3.0 standard seem far less threatening.


There is a fundamental trend toward concurrency in hardware. Multicore systems are now making their way into laptops and desktops. Unfortunately, unless software engineers start taking these trends into account, there's very little that modern hardware can do to make individual applications run faster. Of course, parallel programming is difficult and error-prone, but with the latest tools and programming techniques, there's much more to it than merely POSIX threads. Granted, this article scratches only the surface of what's available. Hopefully, the information presented here will be enough of a tipping point for most readers to start seriously thinking about concurrency in their applications. Our high-definition camcorders demand it and so does every gamer on earth.

Roman Shaposhnik started his career in compilers back in 1994 when he had to write a translator for the programming language he'd just invented (the language was so weird, nobody else wanted the job). His first UNIX exposure was with Slackware 3.0, and he's been hooked ever since. Currently, he works for Sun Microsystems in the Developer Products Group. He is usually found pondering the question of how to make computers faster yet not drive application developers insane. He runs a blog at and can be reached via e-mail at