Go Straight To The Source with CTrace

Software

by Cal McPherson

on July 20, 2004

It is easy to become dependent on debuggers. They allow us to examine variables and to control flow execution as our applications run. But what can you do when your application isn't working and the debugger can't help you? Some applications, by nature, make using a debugger impractical; multithreaded applications fall into this category. Leroy Xavier, the author of LinuxThreads says, "Debuggers are not really effective for subtle concurrency problems, because they disrupt the program too much."

Consider a computer telephony application with a separate thread running for each telephone call. Telephone protocols have timeout conditions associated with them. When you pick up your telephone receiver, you get a dial tone and then you start dialing the number. But, if you fail to dial within a specified timeframe, the call is made null and void by the protocol, and your dial tone is history. This could happen easily if you set a debugger breakpoint directly after the dial tone begins. In this situation, your use of the debugger has caused an error condition.

In addition, you likely want to isolate and examine a particular telephone call, in this case a separately running thread, from the others. A trace/debug library can be a useful tool in cases such as these, allowing variable contents and program flow to be output at runtime. It also would be nice for the trace library to incur minimal overhead at times when the application is not being traced.

My article details the use of an open-source multithreaded trace/debug library called CTrace. It also presents a method of remotely tracing a running application by using the SSH protocol.

What CTrace Does

CTrace provides a set of trace output macros to help with debugging C applications. A brief description of the trace output macros follows:

TRC_ENTER - display procedure/function entry along with argument values
TRC_VOID_RETURN - display and perform procedure return
TRC_RETURN - display and perform function return
TRC_PRINT - equivalent to printf
TRC_ERROR - printf error message

The first three macros provide a primitive simple stack trace; the fourth is equivalent to printf; and the fifth is equivalent to printf with ERROR pre-pended. So why not use printf then?

CTrace also provides a number of layers of abstraction that can be configured dynamically to control which traces are written to the trace output stream. This enables you to :

distinguish program threads.
distinguish user-defined software units.
distinguish user-defined levels.
enable and disable tracing dynamically, in effect removing the runtime overhead of CTrace.
change the output stream the trace calls are written to.

All of the above can be done at runtime.

How CTrace Works

When you call one of the trace output macros from your code, you provide it with both explicit and implicit information about the trace. Let's look at an example. The following function foo is given an integer argument n and prints "Hello world" n times:


1  int foo(int n)
2  {
3      TRC_ENTER(UNIT_FOO, TRC1, ("n=%d", n));
4
5      if(n <= 0){
6          TRC_ERROR(UNIT_FOO, ("Invalid argument: %d", n));
7          TRC_RETURN(UNIT_FOO, TRC1, ("%d", 1), 1);
8      }
9
10     int i;
11     for(i=n; i>0; i--){
12         sleep(1);
13         TRC_PRINT(UNIT_FOO, TRC0, ("Hello world[%d]", i));
14     }
15
16     TRC_RETURN(UNIT_FOO, TRC1, ("%d", 0), 0);
17 }

Look at the TRC_ENTER trace on line 3. The TRC_ENTER macro's job is to display entry into a function along with the function arguments. This particular trace belongs to the logical software unit UNIT_FOO. In fact, all of the trace calls in this function belong to UNIT_FOO. If tracing is turned on within CTrace for UNIT_FOO, all of these traces are written to the trace output stream.

The TRC_ENTER trace on line 3 also belongs to trace level TRC1, as do both TRC_RETURN traces (lines 7 and 16). The TRC_RETURN macro's job is to display and perform function return. Combining TRC_ENTER and TRC_RETURN gives you a primitive stack trace. I am using a level scheme whereby TRC1 is reserved for the stack trace. Here is the output from running foo with n=5 and trace level TRC1 set:


[calsa@trapper foo]$ ./foo 5
07/09/04-16:24:31:main.c:10:./foo:          enter foo(n=5)
07/09/04-16:24:32:main.c:23:./foo:          return foo(0)
[calsa@trapper foo]$

I have reserved trace level TRC0 for TRC_PRINT traces, as you can see from line 13. Here is the output from running foo with n=5 and trace level TRC0 set:


[calsa@trapper foo]$ ./foo 5
07/09/04-16:28:17:main.c:20:./foo:          Hello world[5]
07/09/04-16:28:18:main.c:20:./foo:          Hello world[4]
07/09/04-16:28:19:main.c:20:./foo:          Hello world[3]
07/09/04-16:28:20:main.c:20:./foo:          Hello world[2]
07/09/04-16:28:21:main.c:20:./foo:          Hello world[1]
[calsa@trapper foo]$

Finally, here is the output of running foo with all trace levels set (level=TRC_ALL):


[calsa@trapper foo]$ ./foo 5
07/09/04-16:30:10:main.c:10:./foo:          enter foo(n=5)
07/09/04-16:30:11:main.c:20:./foo:          Hello world[5]
07/09/04-16:30:12:main.c:20:./foo:          Hello world[4]
07/09/04-16:30:13:main.c:20:./foo:          Hello world[3]
07/09/04-16:30:14:main.c:20:./foo:          Hello world[2]
07/09/04-16:30:15:main.c:20:./foo:          Hello world[1]
07/09/04-16:30:15:main.c:23:./foo:          return foo(0)
[calsa@trapper foo]$

The TRC_ERROR macro on line 6 implicitly belongs to level TRC_ERR, so it doesn't require you to pass a level argument to it. Once your program starts to become stable, you might set TRC_ERR as the only trace level, so you are reporting errors only. To be complete, I set TRC_ERR as the sole trace level and run the program with a negative argument:


[calsa@trapper foo]$ ./foo -5
07/09/04-16:40:12:main.c:13:./foo:          ERROR in fn foo: Invalid
argument: -5
[calsa@trapper foo]$

The trace output in the above examples mostly is self-explanatory. Here is the field by field breakdown:


<date>-<time>:<source>:<line>:<thread>      <indented trace output>

Where did CTrace come up with the thread name foo? I have added the main thread of this single-threaded program to CTrace and named it after the executable foo(argv[0]). CTrace manages a collection of data structures representing each program thread. CTrace can identify the calling thread of the trace by calling pthread_self().

Let's convert foo into a multithreaded program by adding a new thread, bar. The bar thread mimics foo but prints "Goodbye cruel world" instead of "Hello world". I want bar to belong to a new logical software unit UNIT_BAR, which I will define. The two threads run concurrently:


#include <pthread.h>
#include <ctrace.h>

#define UNIT_MAX 100
#define UNIT_FOO 1
#define UNIT_BAR 2

int foo(int n)
{
    TRC_ENTER(UNIT_FOO, TRC1, ("n=%d", n));

    if(n <= 0){
        TRC_ERROR(UNIT_FOO, ("Invalid argument: %d", n));
        TRC_RETURN(UNIT_FOO, TRC1, ("%d", 1), 1);
    }

    int i;
    for(i=n; i>0; i--){
        sleep(1);
        TRC_PRINT(UNIT_FOO, TRC0, ("Hello world[%d]", i));
    }

    TRC_RETURN(UNIT_FOO, TRC1, ("%d", 0), 0);
}

void *bar(void *arg)
{
    int n = *(int *)arg;

    TRC_ADD_THREAD("bar", 0);
    TRC_ENTER(UNIT_BAR, TRC1, ("arg=0x%x", arg));

    if(n <= 0){
        TRC_ERROR(UNIT_BAR, ("Invalid argument: %d", n));
        TRC_RETURN(UNIT_BAR, TRC1, ("0x%x", NULL), NULL);
    }

    int i;
    for(i=n; i>0; i--){
        sleep(1);
        TRC_PRINT(UNIT_BAR, TRC0, ("Goodbye cruel world[%d]", i));
    }

    TRC_RETURN(UNIT_BAR, TRC1, ("0x%x", NULL), NULL);
}

int main(int argc, char **argv)
{
    int n;
    pthread_t tid;

    if(argc < 2){
        printf("usage: foo <num_msgs>\n");
        exit(1);
    }

    TRC_INIT(NULL, TRC_ENABLED, TRC_ON, TRC_ALL, UNIT_MAX, 0);
    TRC_ADD_THREAD(argv[0], 0);

    n = atoi(argv[1]);
    pthread_create(&tid, NULL, bar, &n);
    foo(n);
    pthread_join(tid, 0);

    TRC_REMOVE_THREAD(tid);
    TRC_REMOVE_THREAD(pthread_self());
    TRC_END();
}

As you can see, you need to initialize CTrace with TRC_INIT before using it. You also need to add each thread you want traced with the TRC_ADD_THREAD macro, including the main() thread of the program. If you don't add a thread to CTrace, it never will output traces for that thread. Don't forget to remove any thread you add after you have finished with it. Use TRC_REMOVE_THREAD to do this. Here is the output of foo now that we have added bar:


[calsa@trapper foobar]$ ./foo 5
07/09/04-17:05:49:main.c:31:bar:            enter bar(arg=0xbfffe794)
07/09/04-17:05:49:main.c:10:./foo:          enter foo(n=5)
07/09/04-17:05:50:main.c:20:./foo:          Hello world[5]
07/09/04-17:05:50:main.c:41:bar:            Goodbye cruel world[5]
07/09/04-17:05:51:main.c:20:./foo:          Hello world[4]
07/09/04-17:05:51:main.c:41:bar:            Goodbye cruel world[4]
07/09/04-17:05:52:main.c:20:./foo:          Hello world[3]
07/09/04-17:05:52:main.c:41:bar:            Goodbye cruel world[3]
07/09/04-17:05:53:main.c:20:./foo:          Hello world[2]
07/09/04-17:05:53:main.c:41:bar:            Goodbye cruel world[2]
07/09/04-17:05:54:main.c:20:./foo:          Hello world[1]
07/09/04-17:05:54:main.c:23:./foo:          return foo(0)
07/09/04-17:05:54:main.c:41:bar:            Goodbye cruel world[1]
07/09/04-17:05:54:main.c:44:bar:            return bar(0x0)
[calsa@trapper foobar]$

The output shows how easy it is to distinguish between threads.

Dynamically Configuring CTrace

CTrace keeps flags indicating if tracing is turned on or not and which level(s) are set. There is a separate set of flags for each of the following scopes: global, thread and software unit. When a trace output macro is called, CTrace checks to see if any of these flags are set. If the answer is yes, the trace is written to the output stream. There are a number of macros to facilitate the toggling of these flags at runtime to finely tune the trace output. Figure 1 shows the macros and the corresponding data structures on which they act.

Remotely Tracing foo with SSH

Here I demonstrate a method of remotely tracing foo by using the Secure Shell (SSH) protocol. I add a server to foo that runs in the background and accepts requests to call CTrace macros. Tracing is disabled and turned off globally:


int main(int argc, char **argv)
{
    int n;
    pthread_t tid;

    if(argc < 2){
        printf("usage: foo <num_msgs>\n");
        exit(1);
    }

    TRC_INIT(NULL, TRC_DISABLED, TRC_OFF, TRC_ALL, UNIT_MAX, 0); [format
request: bold type]
    TRC_ADD_THREAD(argv[0], 0);

    server_start(); [format request: bold type]

    n = atoi(argv[1]);
    pthread_create(&tid, NULL, bar, &n);
    foo(n);
    pthread_join(tid, 0);

    server_stop(); [format request: bold type]

    TRC_REMOVE_THREAD(tid);
    TRC_REMOVE_THREAD(pthread_self());
    TRC_END();
}

The server accepts strings that mimic CTrace macro calls but without brackets surrounding the arguments. For example, the following string is a valid request to change the trace level globally to TRC4 with TRC_SET_LEVEL TRC4.

I also have written a client that connects to the server on the localhost. The client obtains each line of user input and sends it to the server for processing. The client is named ctrace.

Next, foo is started with an n value of 1000:


[calsa@trapper foobar_server]$ ./foo 1000

The host it is running on has been set up to provide the secure shell (SSH) service. After logging in to a secure shell, I start the ctrace client. I turn tracing on for the software unit UNIT_FOO and output it to the file called foo.txt:


[calsa@trapper foobar_client]$ ./ctrace
Connecting to trapper
Connected.
TRC_FILE /home/calsa/foo.txt
Done.
TRC_ENABLE
Done.
TRC_TURN_UNIT_ON UNIT_FOO
Done.
TRC_TURN_UNIT_OFF UNIT_FOO
Done.

Here is the trace output in "foo.txt":


07/12/04-11:42:57:main.c:18:./foo:          Hello world[965]
07/12/04-11:42:58:main.c:18:./foo:          Hello world[964]
07/12/04-11:42:59:main.c:18:./foo:          Hello world[963]
07/12/04-11:43:00:main.c:18:./foo:          Hello world[962]
07/12/04-11:43:01:main.c:18:./foo:          Hello world[961]
07/12/04-11:43:02:main.c:18:./foo:          Hello world[960]

Now, I turn tracing on for the software unit UNIT_BAR and output it to the file called bar.txt:


TRC_FILE /home/calsa/bar.txt
Done.
TRC_TURN_UNIT_ON UNIT_BAR
Done.
TRC_TURN_UNIT_OFF UNIT_BAR
Done.
TRC_FILE stdout
Done.

Here is the trace output in bar.txt:


07/12/04-11:43:15:main.c:39:bar:            Goodbye cruel world[947]
07/12/04-11:43:16:main.c:39:bar:            Goodbye cruel world[946]
07/12/04-11:43:17:main.c:39:bar:            Goodbye cruel world[945]
07/12/04-11:43:18:main.c:39:bar:            Goodbye cruel world[944]
07/12/04-11:43:19:main.c:39:bar:            Goodbye cruel world[943]
07/12/04-11:43:20:main.c:39:bar:            Goodbye cruel world[942]
07/12/04-11:43:21:main.c:39:bar:            Goodbye cruel world[941]
07/12/04-11:43:22:main.c:39:bar:            Goodbye cruel world[940]

The files foo.txt and bar.txt can be downloaded by using a secure FTP service running on the host.

I have decided I no longer need to trace foo, so I disable tracing and log out of the secure shell:


TRC_DISABLE
Done.
QUIT
Goodbye.
[calsa@trapper foobar_client]$exit

What is the cost to foo of using CTrace with tracing disabled? The trace output macros are wrappers around a trace function call. An if statement checks to see if tracing is enabled. If tracing is disabled, there is no function call. The cost of running CTrace while disabled, therefore, is one-conditional if tested per trace call.

Resources

The CTrace Home Page

The CTrace SourceForge Project Site

The various versions of foo described above are included in the download of CTrace.

Load Disqus comments