Go Straight To The Source with CTrace

Tracing multithreaded applications using CTrace.

It is easy to become dependent on debuggers. They allow us to examine variables and to control flow execution as our applications run. But what can you do when your application isn't working and the debugger can't help you? Some applications, by nature, make using a debugger impractical; multithreaded applications fall into this category. Leroy Xavier, the author of LinuxThreads says, "Debuggers are not really effective for subtle concurrency problems, because they disrupt the program too much."

Consider a computer telephony application with a separate thread running for each telephone call. Telephone protocols have timeout conditions associated with them. When you pick up your telephone receiver, you get a dial tone and then you start dialing the number. But, if you fail to dial within a specified timeframe, the call is made null and void by the protocol, and your dial tone is history. This could happen easily if you set a debugger breakpoint directly after the dial tone begins. In this situation, your use of the debugger has caused an error condition.

In addition, you likely want to isolate and examine a particular telephone call, in this case a separately running thread, from the others. A trace/debug library can be a useful tool in cases such as these, allowing variable contents and program flow to be output at runtime. It also would be nice for the trace library to incur minimal overhead at times when the application is not being traced.

My article details the use of an open-source multithreaded trace/debug library called CTrace. It also presents a method of remotely tracing a running application by using the SSH protocol.

What CTrace Does

CTrace provides a set of trace output macros to help with debugging C applications. A brief description of the trace output macros follows:

  • TRC_ENTER - display procedure/function entry along with argument values

  • TRC_VOID_RETURN - display and perform procedure return

  • TRC_RETURN - display and perform function return

  • TRC_PRINT - equivalent to printf

  • TRC_ERROR - printf error message

The first three macros provide a primitive simple stack trace; the fourth is equivalent to printf; and the fifth is equivalent to printf with ERROR pre-pended. So why not use printf then?

CTrace also provides a number of layers of abstraction that can be configured dynamically to control which traces are written to the trace output stream. This enables you to :

  • distinguish program threads.

  • distinguish user-defined software units.

  • distinguish user-defined levels.

  • enable and disable tracing dynamically, in effect removing the runtime overhead of CTrace.

  • change the output stream the trace calls are written to.

All of the above can be done at runtime.

How CTrace Works

When you call one of the trace output macros from your code, you provide it with both explicit and implicit information about the trace. Let's look at an example. The following function foo is given an integer argument n and prints "Hello world" n times:


1  int foo(int n)
2  {
3      TRC_ENTER(UNIT_FOO, TRC1, ("n=%d", n));
4
5      if(n <= 0){
6          TRC_ERROR(UNIT_FOO, ("Invalid argument: %d", n));
7          TRC_RETURN(UNIT_FOO, TRC1, ("%d", 1), 1);
8      }
9
10     int i;
11     for(i=n; i>0; i--){
12         sleep(1);
13         TRC_PRINT(UNIT_FOO, TRC0, ("Hello world[%d]", i));
14     }
15
16     TRC_RETURN(UNIT_FOO, TRC1, ("%d", 0), 0);
17 }

Look at the TRC_ENTER trace on line 3. The TRC_ENTER macro's job is to display entry into a function along with the function arguments. This particular trace belongs to the logical software unit UNIT_FOO. In fact, all of the trace calls in this function belong to UNIT_FOO. If tracing is turned on within CTrace for UNIT_FOO, all of these traces are written to the trace output stream.

The TRC_ENTER trace on line 3 also belongs to trace level TRC1, as do both TRC_RETURN traces (lines 7 and 16). The TRC_RETURN macro's job is to display and perform function return. Combining TRC_ENTER and TRC_RETURN gives you a primitive stack trace. I am using a level scheme whereby TRC1 is reserved for the stack trace. Here is the output from running foo with n=5 and trace level TRC1 set:


[calsa@trapper foo]$ ./foo 5
07/09/04-16:24:31:main.c:10:./foo:          enter foo(n=5)
07/09/04-16:24:32:main.c:23:./foo:          return foo(0)
[calsa@trapper foo]$

I have reserved trace level TRC0 for TRC_PRINT traces, as you can see from line 13. Here is the output from running foo with n=5 and trace level TRC0 set:


[calsa@trapper foo]$ ./foo 5
07/09/04-16:28:17:main.c:20:./foo:          Hello world[5]
07/09/04-16:28:18:main.c:20:./foo:          Hello world[4]
07/09/04-16:28:19:main.c:20:./foo:          Hello world[3]
07/09/04-16:28:20:main.c:20:./foo:          Hello world[2]
07/09/04-16:28:21:main.c:20:./foo:          Hello world[1]
[calsa@trapper foo]$

Finally, here is the output of running foo with all trace levels set (level=TRC_ALL):


[calsa@trapper foo]$ ./foo 5
07/09/04-16:30:10:main.c:10:./foo:          enter foo(n=5)
07/09/04-16:30:11:main.c:20:./foo:          Hello world[5]
07/09/04-16:30:12:main.c:20:./foo:          Hello world[4]
07/09/04-16:30:13:main.c:20:./foo:          Hello world[3]
07/09/04-16:30:14:main.c:20:./foo:          Hello world[2]
07/09/04-16:30:15:main.c:20:./foo:          Hello world[1]
07/09/04-16:30:15:main.c:23:./foo:          return foo(0)
[calsa@trapper foo]$

The TRC_ERROR macro on line 6 implicitly belongs to level TRC_ERR, so it doesn't require you to pass a level argument to it. Once your program starts to become stable, you might set TRC_ERR as the only trace level, so you are reporting errors only. To be complete, I set TRC_ERR as the sole trace level and run the program with a negative argument:


[calsa@trapper foo]$ ./foo -5
07/09/04-16:40:12:main.c:13:./foo:          ERROR in fn foo: Invalid
argument: -5
[calsa@trapper foo]$

The trace output in the above examples mostly is self-explanatory. Here is the field by field breakdown:


<date>-<time>:<source>:<line>:<thread>      <indented trace output>

Where did CTrace come up with the thread name foo? I have added the main thread of this single-threaded program to CTrace and named it after the executable foo(argv[0]). CTrace manages a collection of data structures representing each program thread. CTrace can identify the calling thread of the trace by calling pthread_self().

Let's convert foo into a multithreaded program by adding a new thread, bar. The bar thread mimics foo but prints "Goodbye cruel world" instead of "Hello world". I want bar to belong to a new logical software unit UNIT_BAR, which I will define. The two threads run concurrently:


#include <pthread.h>
#include <ctrace.h>

#define UNIT_MAX 100
#define UNIT_FOO 1
#define UNIT_BAR 2

int foo(int n)
{
    TRC_ENTER(UNIT_FOO, TRC1, ("n=%d", n));

    if(n <= 0){
        TRC_ERROR(UNIT_FOO, ("Invalid argument: %d", n));
        TRC_RETURN(UNIT_FOO, TRC1, ("%d", 1), 1);
    }

    int i;
    for(i=n; i>0; i--){
        sleep(1);
        TRC_PRINT(UNIT_FOO, TRC0, ("Hello world[%d]", i));
    }

    TRC_RETURN(UNIT_FOO, TRC1, ("%d", 0), 0);
}

void *bar(void *arg)
{
    int n = *(int *)arg;

    TRC_ADD_THREAD("bar", 0);
    TRC_ENTER(UNIT_BAR, TRC1, ("arg=0x%x", arg));

    if(n <= 0){
        TRC_ERROR(UNIT_BAR, ("Invalid argument: %d", n));
        TRC_RETURN(UNIT_BAR, TRC1, ("0x%x", NULL), NULL);
    }

    int i;
    for(i=n; i>0; i--){
        sleep(1);
        TRC_PRINT(UNIT_BAR, TRC0, ("Goodbye cruel world[%d]", i));
    }

    TRC_RETURN(UNIT_BAR, TRC1, ("0x%x", NULL), NULL);
}

int main(int argc, char **argv)
{
    int n;
    pthread_t tid;

    if(argc < 2){
        printf("usage: foo <num_msgs>\n");
        exit(1);
    }

    TRC_INIT(NULL, TRC_ENABLED, TRC_ON, TRC_ALL, UNIT_MAX, 0);
    TRC_ADD_THREAD(argv[0], 0);

    n = atoi(argv[1]);
    pthread_create(&tid, NULL, bar, &n);
    foo(n);
    pthread_join(tid, 0);

    TRC_REMOVE_THREAD(tid);
    TRC_REMOVE_THREAD(pthread_self());
    TRC_END();
}

As you can see, you need to initialize CTrace with TRC_INIT before using it. You also need to add each thread you want traced with the TRC_ADD_THREAD macro, including the main() thread of the program. If you don't add a thread to CTrace, it never will output traces for that thread. Don't forget to remove any thread you add after you have finished with it. Use TRC_REMOVE_THREAD to do this. Here is the output of foo now that we have added bar:


[calsa@trapper foobar]$ ./foo 5
07/09/04-17:05:49:main.c:31:bar:            enter bar(arg=0xbfffe794)
07/09/04-17:05:49:main.c:10:./foo:          enter foo(n=5)
07/09/04-17:05:50:main.c:20:./foo:          Hello world[5]
07/09/04-17:05:50:main.c:41:bar:            Goodbye cruel world[5]
07/09/04-17:05:51:main.c:20:./foo:          Hello world[4]
07/09/04-17:05:51:main.c:41:bar:            Goodbye cruel world[4]
07/09/04-17:05:52:main.c:20:./foo:          Hello world[3]
07/09/04-17:05:52:main.c:41:bar:            Goodbye cruel world[3]
07/09/04-17:05:53:main.c:20:./foo:          Hello world[2]
07/09/04-17:05:53:main.c:41:bar:            Goodbye cruel world[2]
07/09/04-17:05:54:main.c:20:./foo:          Hello world[1]
07/09/04-17:05:54:main.c:23:./foo:          return foo(0)
07/09/04-17:05:54:main.c:41:bar:            Goodbye cruel world[1]
07/09/04-17:05:54:main.c:44:bar:            return bar(0x0)
[calsa@trapper foobar]$

The output shows how easy it is to distinguish between threads.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

GNU nana

Anonymous's picture

What are the key difference(s) with GNU
nana?
(GNU nana may not work anymore with recent gcc compilers.)

Xavier

Anonymous's picture

It's Xavier Leroy, not the other way around.

Re: Xavier

calsa's picture

Sorry, will try and get that corrected.