What Is Multi-Threading?
Threading libraries can be implemented using kernel features for creation and scheduling (such as clone()), or entirely in user space where code in the library handles the creation of threads and the scheduling between threads within a process. In general, user-level thread libraries will be faster than kernel-thread libraries. On the other hand, kernel-thread libraries are likely to make better use of kernel facilities—if the kernel makes better use of multiple processors in the next release, so might all your multi-threaded programs. However, the programmer shouldn't need to worry about whether the library is implemented as a user-level library, a kernel-level library, or a combination of the two; the operation of the program should be essentially the same.
There are a many POSIX threads libraries available for Linux, and some make no attempt to be POSIX-compliant. Also, some of the POSIX threads libraries are compliant with an early draft of the standard or just part of the standard. Anything which isn't compliant with the final POSIX standard may show different behaviour in some circumstances or have slightly different function calls. That is not to say these aren't good multi-threading libraries, but you should be aware of what you're using.
All examples in this article were tested with Xavier Leroy's release 0.3 (ALPHA) of LinuxThreads, covered by the GNU Library General Public License. This library is still being developed and aims to be fully POSIX-compliant sometime in the future. It is clone()-based, and so it has the advantages and disadvantages of a kernel-level implementation.
As previously noted, multi-threaded programming isn't really very difficult. However, there are ways to make life difficult for yourself.
Guides to good programming often say to avoid using global data. Maybe you've never seen the point of this—particularly if you're a careful programmer and you know to look after your global data. With multi-threading, there is another reason to avoid using global data. Consider the trivial code fragments in Listing 2.
When this function is called you would expect to see the numbers “0 1 2 3” printed. And indeed, this is what would happen. If you were to call this function from two different threads you might expect to see two sets of the numbers printed, one from each thread. If the two threads were to call this function at about the same moment you might expect to see the two sets of numbers interlaced, perhaps something like:
0 1 0 1 2 2 3 3
Here, thread A calls fn(), which starts to print the numbers. When thread A is only as far as “1”, thread B starts up and calls fn(), which manages to get as far as “2” before thread A gets control again. The call to fn() completes for thread A and thread B gets control again and finishes off.
However, this is not what would happen. Let me stress this is NOT what would happen. Instead, the output might be:
0 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14...
What is happening is thread A calls fn() which prints the numbers up to “1”. Then thread B starts up and calls fn(). As the for loop is entered, the value of i is set to 0. Remember i is global, so it is shared by both threads—when thread B changes the value of i, thread A will see the same change. The counter reaches the value “2”, at which point thread A is given control again. When it attempts to increment i, it now reaches “3” and then “4”, at which point thread A does not print the value of i and is ready to exit the function. When thread B gets control again, it prints the current value of i (which is “4”, courtesy of thread A), increments it to “5”, then does the comparison i!=4. The comparison doesn't fail (and will not fail for a long time—not until i has reached the maximum int value and looped around again), so the loop continues printing numbers.
This sounds horrible. It is horrible! And it can be avoided by not using global data. If i was declared local to fn(), each thread would have its own copy and you would get the expected output.
Sometimes you may find you need to use global data, either because you are adding multi-threading features to an existing program which cannot be rewritten, or because you believe global data is the correct thing to use. Notice, in this context, I'm saying “global data” when what I really mean is shared data--the data may not be global to the program, but it is global to (or shared between) one or more threads. In a conventional, single-threaded program static local variables can be a useful convenience; however, these are effectively global variables and if you don't take care with them, they will cause you problems with bugs that can be very hard to track down. So what do you do with global data? All you have to do is make sure no other thread can change the value of your global data between the time you set a value to that data and the time you use the value. POSIX threads provides a number of ways of performing this synchronisation between threads. The simplest are mutual exclusion locks, or mutexes. A thread locks a mutex at the start of a section of code, and unlocks it at the end of that section of code. A thread which holds a lock on a mutex is said to own that mutex. While one thread owns that mutex, no other thread will be able to execute that same section of code.
Consider our previous example with the runaway loop. We can rewrite this so that it still uses global data, but with mutexes to prevent two threads modifying the same loop counter at the same time.
In Listing 3, loopLock is a mutex variable. Mutex variables should always be initialised before use, either by a call to the initialisation function pthread\_mutex\_init(), which can be used to set values for attributes which are particular to that mutex variable, or by using the standard macro PTHREAD\_MUTEX\_INITIALIZER which uses default values. These attributes can affect the priority and scheduling of the thread which owns the mutex, or dictate which threads can operate on the mutex (either those within the same process or any threads which can access the memory where the mutex resides). We make use of only default mutex attributes here.
When a thread makes a call to pthread\_mutex\_lock(), the mutex variable it tries to own will be either free or owned by another thread. (If a thread makes an attempt to lock a mutex which it already owns, the result is undefined. Don't do it!) If the mutex is free, the thread will take ownership of that mutex and pthread\_mutex\_lock() will return. If another thread owns that mutex, the call will wait until the mutex comes free and can claim ownership for its own calling thread. The thread then owns that mutex until it makes a call to pthread\_mutex\_unlock(). (Again, if a thread makes an attempt to unlock a mutex which it doesn't already own, the result is undefined.)
As a result, if more than one thread tries to execute the code in the function at the same time, then only the first will own the lock and enter the loop. Any other threads will sit at the pthread\_mutex\_lock() call until the first thread has exited the loop and unlocked the mutex. Then ownership will be given to just one of the waiting threads which will be able to enter the loop safely.
So if there are two threads which call this function at the same time, the output will be:
0 1 2 3 0 1 2 3
As expected, each pass through the loop completes before the next starts and the output is not interlaced.
Free DevOps eBooks, Videos, and more!
Regardless of where you are in your DevOps process, Linux Journal can help!
We offer here the DEFINITIVE DevOps for Dummies, a mobile Application Development Primer, and advice & help from the expert sources like:
- Linux Journal