Optimizing Performance through Parallelism
In this article we will look at an example of how to turn a serial algorithm into one that has higher performance in symmetric multiprocessing (shared-memory), as well as distributed-memory environments. In order to fulfill this task, we will develop a simple application in three stages: a serial version, a multithreaded version and a distributed multithreaded version.
In addition to the theoretical aspects of parallel programming, we will discuss some of the practical problems encountered when programming. We have chosen to implement all of the examples in C++ and use the POSIX threads (pthreads) and MPI libraries for symmetric multiprocessing and distributed processing, respectively.
The problem we have chosen to explore is that of finding the number of primes in a specified range. The problem has the advantage of being both simple to comprehend and illustrative of the concepts involved with parallel programming.
In our implementation, we have chosen to represent a range of numbers by an object that has the ability to count the number of primes in its range (see Listing 1).
Here is an example of compiling and using the program:
bash$ g++ -o primes primes.cpp bash$ ./primes 0 10000 There were 1229 primes.
In order to improve the speed of our example on symmetric-multiprocessing (SMP) machines, we need to make use of threads. We would like to stick to our design in the previous example, which means we need to find a way for each object to have its own thread of execution. Unfortunately, mixing C++ and pthreads is nontrivial, as pthread_create( ) expects a function that has been linked with C-style linking, not C++. We have solved this problem by creating an accessory class and a static member function (see Listing 2).
The remainder of the CountPrimes object implementation is the same. There are three points to note: 1) The Threaded class is an abstract class. 2) The entry( ) function is a static member function, which means that it has knowledge of the details of the Threaded class but is not tied to a specific instance. It therefore does not go through name-mangling and can be used as a C-style function. This is the function pointer we will pass to pthread_create( ) along with a pointer to the object to be threaded. 3) The run( ) member function is pure virtual, and as such must be implemented by any class derived from Threaded. This function will be the main execution point of the derived class, and its return value will represent the result of the computation. In the case of our CountPrimes class, this function simply calculates the total for the range and returns it.
We would like to retain the usage form of our serial implementation, and therefore add only one extra parameter that controls the number of threads that will be spawned to complete the task. Because we do not know beforehand how many objects (threads) will be needed, we will allocate them dynamically (see Listing 3).
There are a few more subtleties in this example, so we will go through the code in a little more detail. First we set the default number of threads to two and then check to see if the user specified another value. We create a dynamic array of pthread_t that will store the thread ids and then create a dynamic array of pointers to CountPrimes objects. This is important because we need to instantiate each one with a different range. In fact, we could not create a static array of CountPrimes objects since there is no default constructor. This design forces us to use the object correctly.
Next is a loop that will spawn the individual threads with their respective ranges of numbers to check. Note that we have made no attempt at load balancing here. We will return to this concept later. The important point is that each CountPrimes object is instantiated dynamically, and its pointer is stored in the array created above. The thread is actually spawned through thread_create( ). We pass a pointer to the entry member function and a pointer to the object itself. The id of the spawned thread is stored in the thread id array.
Next we wait for the threads to finish computing their totals by using pthread_join( ) on the thread ids. Because we dynamically allocated space for the return value in run( ), we must de-allocate it here. Each thread's total is added to count.
Finally, the CountPrimes objects are de-alloacted, and both the thread id array and counter object pointer array are de-allocated. Here is an example of compiling and using the program:
bash$ g++ -o primes_threaded primes_threaded.cpp bash$ ./primes_threaded 0 10000 4 There were 1229 primes.
|Free Today: September Issue of Linux Journal (Retail value: $5.99)||Sep 27, 2016|
|nginx||Sep 27, 2016|
|Epiq Solutions' Sidekiq M.2||Sep 26, 2016|
|Nativ Disc||Sep 23, 2016|
|Android Browser Security--What You Haven't Been Told||Sep 22, 2016|
|The Many Paths to a Solution||Sep 21, 2016|
- Free Today: September Issue of Linux Journal (Retail value: $5.99)
- Readers' Choice Awards 2013
- Android Browser Security--What You Haven't Been Told
- Epiq Solutions' Sidekiq M.2
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- The Many Paths to a Solution
- Securing the Programmer
- Nativ Disc
- Synopsys' Coverity
Pick up any e-commerce web or mobile app today, and you’ll be holding a mashup of interconnected applications and services from a variety of different providers. For instance, when you connect to Amazon’s e-commerce app, cookies, tags and pixels that are monitored by solutions like Exact Target, BazaarVoice, Bing, Shopzilla, Liveramp and Google Tag Manager track every action you take. You’re presented with special offers and coupons based on your viewing and buying patterns. If you find something you want for your birthday, a third party manages your wish list, which you can share through multiple social- media outlets or email to a friend. When you select something to buy, you find yourself presented with similar items as kind suggestions. And when you finally check out, you’re offered the ability to pay with promo codes, gifts cards, PayPal or a variety of credit cards.Get the Guide