Optimizing Performance through Parallelism
In this article we will look at an example of how to turn a serial algorithm into one that has higher performance in symmetric multiprocessing (shared-memory), as well as distributed-memory environments. In order to fulfill this task, we will develop a simple application in three stages: a serial version, a multithreaded version and a distributed multithreaded version.
In addition to the theoretical aspects of parallel programming, we will discuss some of the practical problems encountered when programming. We have chosen to implement all of the examples in C++ and use the POSIX threads (pthreads) and MPI libraries for symmetric multiprocessing and distributed processing, respectively.
The problem we have chosen to explore is that of finding the number of primes in a specified range. The problem has the advantage of being both simple to comprehend and illustrative of the concepts involved with parallel programming.
In our implementation, we have chosen to represent a range of numbers by an object that has the ability to count the number of primes in its range (see Listing 1).
Here is an example of compiling and using the program:
bash$ g++ -o primes primes.cpp bash$ ./primes 0 10000 There were 1229 primes.
In order to improve the speed of our example on symmetric-multiprocessing (SMP) machines, we need to make use of threads. We would like to stick to our design in the previous example, which means we need to find a way for each object to have its own thread of execution. Unfortunately, mixing C++ and pthreads is nontrivial, as pthread_create( ) expects a function that has been linked with C-style linking, not C++. We have solved this problem by creating an accessory class and a static member function (see Listing 2).
The remainder of the CountPrimes object implementation is the same. There are three points to note: 1) The Threaded class is an abstract class. 2) The entry( ) function is a static member function, which means that it has knowledge of the details of the Threaded class but is not tied to a specific instance. It therefore does not go through name-mangling and can be used as a C-style function. This is the function pointer we will pass to pthread_create( ) along with a pointer to the object to be threaded. 3) The run( ) member function is pure virtual, and as such must be implemented by any class derived from Threaded. This function will be the main execution point of the derived class, and its return value will represent the result of the computation. In the case of our CountPrimes class, this function simply calculates the total for the range and returns it.
We would like to retain the usage form of our serial implementation, and therefore add only one extra parameter that controls the number of threads that will be spawned to complete the task. Because we do not know beforehand how many objects (threads) will be needed, we will allocate them dynamically (see Listing 3).
There are a few more subtleties in this example, so we will go through the code in a little more detail. First we set the default number of threads to two and then check to see if the user specified another value. We create a dynamic array of pthread_t that will store the thread ids and then create a dynamic array of pointers to CountPrimes objects. This is important because we need to instantiate each one with a different range. In fact, we could not create a static array of CountPrimes objects since there is no default constructor. This design forces us to use the object correctly.
Next is a loop that will spawn the individual threads with their respective ranges of numbers to check. Note that we have made no attempt at load balancing here. We will return to this concept later. The important point is that each CountPrimes object is instantiated dynamically, and its pointer is stored in the array created above. The thread is actually spawned through thread_create( ). We pass a pointer to the entry member function and a pointer to the object itself. The id of the spawned thread is stored in the thread id array.
Next we wait for the threads to finish computing their totals by using pthread_join( ) on the thread ids. Because we dynamically allocated space for the return value in run( ), we must de-allocate it here. Each thread's total is added to count.
Finally, the CountPrimes objects are de-alloacted, and both the thread id array and counter object pointer array are de-allocated. Here is an example of compiling and using the program:
bash$ g++ -o primes_threaded primes_threaded.cpp bash$ ./primes_threaded 0 10000 4 There were 1229 primes.
Fast/Flexible Linux OS Recovery
On Demand Now
In this live one-hour webinar, learn how to enhance your existing backup strategies for complete disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible full-system recovery solution for UNIX and Linux systems.
Join Linux Journal's Shawn Powers and David Huffman, President/CEO, Storix, Inc.
Free to Linux Journal readers.Register Now!
- Ubuntu Online Summit
- Devuan Beta Release
- The Qt Company's Qt Start-Up
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- May 2016 Issue of Linux Journal
- The US Government and Open-Source Software
- Open-Source Project Secretly Funded by CIA
- The Death of RoboVM
- New Container Image Standard Promises More Portable Apps
- EnterpriseDB's EDB Postgres Advanced Server and EDB Postgres Enterprise Manager
In modern computer systems, privacy and security are mandatory. However, connections from the outside over public networks automatically imply risks. One easily available solution to avoid eavesdroppers’ attempts is SSH. But, its wide adoption during the past 21 years has made it a target for attackers, so hardening your system properly is a must.
Additionally, in highly regulated markets, you must comply with specific operational requirements, proving that you conform to standards and even that you have included new mandatory authentication methods, such as two-factor authentication. In this ebook, I discuss SSH and how to configure and manage it to guarantee that your network is safe, your data is secure and that you comply with relevant regulations.Get the Guide