What Is Multi-Threading?

With the 2.0 kernel, most Linux users now have the capability of using multi-threaded processes—well, what does that mean?
User-Level Threads Versus Kernel-Level Threads

Threading libraries can be implemented using kernel features for creation and scheduling (such as clone()), or entirely in user space where code in the library handles the creation of threads and the scheduling between threads within a process. In general, user-level thread libraries will be faster than kernel-thread libraries. On the other hand, kernel-thread libraries are likely to make better use of kernel facilities—if the kernel makes better use of multiple processors in the next release, so might all your multi-threaded programs. However, the programmer shouldn't need to worry about whether the library is implemented as a user-level library, a kernel-level library, or a combination of the two; the operation of the program should be essentially the same.

Available Libraries

There are a many POSIX threads libraries available for Linux, and some make no attempt to be POSIX-compliant. Also, some of the POSIX threads libraries are compliant with an early draft of the standard or just part of the standard. Anything which isn't compliant with the final POSIX standard may show different behaviour in some circumstances or have slightly different function calls. That is not to say these aren't good multi-threading libraries, but you should be aware of what you're using.

All examples in this article were tested with Xavier Leroy's release 0.3 (ALPHA) of LinuxThreads, covered by the GNU Library General Public License. This library is still being developed and aims to be fully POSIX-compliant sometime in the future. It is clone()-based, and so it has the advantages and disadvantages of a kernel-level implementation.

Don't Use Global Data

As previously noted, multi-threaded programming isn't really very difficult. However, there are ways to make life difficult for yourself.

Guides to good programming often say to avoid using global data. Maybe you've never seen the point of this—particularly if you're a careful programmer and you know to look after your global data. With multi-threading, there is another reason to avoid using global data. Consider the trivial code fragments in Listing 2.

When this function is called you would expect to see the numbers “0 1 2 3” printed. And indeed, this is what would happen. If you were to call this function from two different threads you might expect to see two sets of the numbers printed, one from each thread. If the two threads were to call this function at about the same moment you might expect to see the two sets of numbers interlaced, perhaps something like:

0 1 0 1 2 2 3
3

Here, thread A calls fn(), which starts to print the numbers. When thread A is only as far as “1”, thread B starts up and calls fn(), which manages to get as far as “2” before thread A gets control again. The call to fn() completes for thread A and thread B gets control again and finishes off.

However, this is not what would happen. Let me stress this is NOT what would happen. Instead, the output might be:

0 1 0 1 2 3
4 5 6 7 8 9 10 11 12 13 14...

What is happening is thread A calls fn() which prints the numbers up to “1”. Then thread B starts up and calls fn(). As the for loop is entered, the value of i is set to 0. Remember i is global, so it is shared by both threads—when thread B changes the value of i, thread A will see the same change. The counter reaches the value “2”, at which point thread A is given control again. When it attempts to increment i, it now reaches “3” and then “4”, at which point thread A does not print the value of i and is ready to exit the function. When thread B gets control again, it prints the current value of i (which is “4”, courtesy of thread A), increments it to “5”, then does the comparison i!=4. The comparison doesn't fail (and will not fail for a long time—not until i has reached the maximum int value and looped around again), so the loop continues printing numbers.

This sounds horrible. It is horrible! And it can be avoided by not using global data. If i was declared local to fn(), each thread would have its own copy and you would get the expected output.

How To Use Global Data

Sometimes you may find you need to use global data, either because you are adding multi-threading features to an existing program which cannot be rewritten, or because you believe global data is the correct thing to use. Notice, in this context, I'm saying “global data” when what I really mean is shared data--the data may not be global to the program, but it is global to (or shared between) one or more threads. In a conventional, single-threaded program static local variables can be a useful convenience; however, these are effectively global variables and if you don't take care with them, they will cause you problems with bugs that can be very hard to track down. So what do you do with global data? All you have to do is make sure no other thread can change the value of your global data between the time you set a value to that data and the time you use the value. POSIX threads provides a number of ways of performing this synchronisation between threads. The simplest are mutual exclusion locks, or mutexes. A thread locks a mutex at the start of a section of code, and unlocks it at the end of that section of code. A thread which holds a lock on a mutex is said to own that mutex. While one thread owns that mutex, no other thread will be able to execute that same section of code.

Consider our previous example with the runaway loop. We can rewrite this so that it still uses global data, but with mutexes to prevent two threads modifying the same loop counter at the same time.

In Listing 3, loopLock is a mutex variable. Mutex variables should always be initialised before use, either by a call to the initialisation function pthread\_mutex\_init(), which can be used to set values for attributes which are particular to that mutex variable, or by using the standard macro PTHREAD\_MUTEX\_INITIALIZER which uses default values. These attributes can affect the priority and scheduling of the thread which owns the mutex, or dictate which threads can operate on the mutex (either those within the same process or any threads which can access the memory where the mutex resides). We make use of only default mutex attributes here.

When a thread makes a call to pthread\_mutex\_lock(), the mutex variable it tries to own will be either free or owned by another thread. (If a thread makes an attempt to lock a mutex which it already owns, the result is undefined. Don't do it!) If the mutex is free, the thread will take ownership of that mutex and pthread\_mutex\_lock() will return. If another thread owns that mutex, the call will wait until the mutex comes free and can claim ownership for its own calling thread. The thread then owns that mutex until it makes a call to pthread\_mutex\_unlock(). (Again, if a thread makes an attempt to unlock a mutex which it doesn't already own, the result is undefined.)

As a result, if more than one thread tries to execute the code in the function at the same time, then only the first will own the lock and enter the loop. Any other threads will sit at the pthread\_mutex\_lock() call until the first thread has exited the loop and unlocked the mutex. Then ownership will be given to just one of the waiting threads which will be able to enter the loop safely.

So if there are two threads which call this function at the same time, the output will be:

0 1 2 3
0 1 2 3

As expected, each pass through the loop completes before the next starts and the output is not interlaced.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Nice Article.. very usefull

Anonymous's picture

Nice Article.. very usefull for basic understanding of multithreading.

Thanks.

What is multithreading

Anonymous's picture

to know more about multithreading, visit : http://thekiransblog.blogspot.com/2010/02/multithreading.html

Thanks, very useful guide.

Jyaan's picture

Thanks, very useful guide.

Re: What Is Multi-Threading?

Anonymous's picture

I teach English HighSchool Computer students, the exam board sneakily put multi-threading on the exam paper this January. This article clarified the whole thing. THANKS. regards: Mel Walker, Durham Sixth Form Centre, England.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix