eCrash: Debugging without Core Dumps

 in
How to use backtrace and a custom library to debug your embedded applications.
A Simple Backtrace Handler

So, what does the crash handler look like? To get a backtrace, the first thing we need to do is grab our signals. Some of the common ones are SIGSEGV, SIGILL and SIGBUS. Additionally, abort() is usually called in the case of an assertion and generates a SIGABRT.

Then, when a signal occurs, we need to save our backtrace. The following snippet details a simple backtrace function that displays the backtrace to standard output when a crash happens:


void signal_handler(int signo)
    {
       void *stack[20];
    int count, i;

    // Shouldn't use printf . . . oh well
    printf("Caught signal %d\n");

    count = backtrace(stack, 20);
        for (i=0; i < count; i++) {
      printf("Frame %2d: %p\n", i+1, stack[i]);
    }
}
int main(...)
{
  ...
  signal(SIGBUS, signal_handler);
  signal(SIGILL, signal_handler);
  signal(SIGSEGV, signal_handler);
  signal(SIGABRT, signal_handler);
}

Caught signal 11
Frame  1:  0x401a84
Frame  2:  0x401d88
...

And, here is a similar signal handler, but one that uses backtrace_symbols to print out a prettier backtrace:


void signal_handler(int signo)
    {
       void *stack[20];
           char **functions;

    int count, i;

    // Shouldn't use printf . . . oh well
    printf("Caught signal %d\n");

    count = backtrace(stack, 20);
            functions = backtrace_symbols(stack, count);
         for (i=0; i < count; i++) {
      printf("Frame %2d: %s\n", i, functions[i]);
    }
            free(functions);
}

Caught signal 11
Frame  1:  ./a.out [0x401a84]
Frame  2:  ./a.out [0x401bfa]
...

eCrash—A Generic Crash Handler

As I was writing this article, I realized that this was about the fifth time I had written a crash handler. (Why can't all software be open source?) So, I decided to write a quick library to handle crash dumps and provide it for this article. I liked the little library I started with, but I found myself needing more and more features. As I kept extending it, I realized that it was a very useful library, that I wanted to be able to leverage on any future project!

I named the new library eCrash and created a SourceForge site for it (see Resources). Since then, I have been extending it, and it now supports dumping multiple threads—using only backtrace, using backtrace and backtrace_symbols, and using backtrace with a user-supplied symbol table to avoid the malloc() inside of backtrace_symbols. The rest of the examples in this article are going to be leveraging eCrash.

eCrash is relatively simple to use. You first call eCrash_Init() from your parent thread. If you have a single-threaded program, you are already finished. A backtrace will be delivered based on your settings in the parameters structure.

If you have a multithreaded program, any thread that wants to be backtraced in a crash (other than the crashing thread) must also call eCrash_RegisterThread(). It is sometimes useful to dump the stacks of all threads when a crash occurs, not only the crashing thread's stack.

With eCrash, you specify where the output should go by setting file descriptors (async safe writes), FILE * streams (not async safe), and/or a filename of the file to output when a crash occurs. eCrash will write to all destinations supplied.

eCrash—Gathering Stacks from Other Threads

Obtaining the stack from a thread that did not crash is a bit trickier. When a thread registers, it specifies a signal that the thread does not catch or block. eCrash registers a handler for that signal (called the Backtrace Signal).

When eCrash needs to dump a thread (when some other thread has caused an exception), it sends the thread the Backtrace Signal via pthread_kill(). When that signal is caught, the thread saves its backtrace to a global area and continues on. The main exception handler can then read the stack and display it.

What we end up with is a very nice-looking crash dump, showing exactly what was happening in the system when the failure occurred.

eCrash—A Real-World Example

Enough talking—time for some meat. Now, let's take what we have discussed and put it to work. We are going to use the ecrash_test program included in eCrash. That program was designed to break in any one of its threads (generate a segmentation violation by attempting to write to a NULL pointer).

We execute the test program with the following flags:

ecrash_test --num_threads=5 --thread_to_crash=3

This causes the test program to generate five threads. All but thread number 3 will call a few functions, then go to sleep. Thread 3 will call a few functions (to make the backtrace interesting) and crash.

The crash file generated is shown in Listing 1 and backtrace_symbols() in Listing 2. Due to space constraints, all listings for this article are available on the Linux Journal FTP site (ftp.linuxjournal.com/pub/lj/listings/issue149/8724.tgz).

The crash file has the backtrace of our offending thread (the one that caused the segmentation violation) and the backtraces of all threads on the system.

Now it's time to debug the crash. We will debug this as if the crash happened at a remote site and the system administrator e-mailed you this crash file.

One last thing: in the real world, the executables always are stripped of debugging information. But, that is okay. As long as you keep a copy of the program with its debugging information, you can ship a stripped copy of the code, and everything will still work!

So, in the lab, you have your crash file and your program with debugging information. Run gdb on the debug version of your program. We know that we have a segmentation violation. So, starting from frame zero of the offending thread, start listing the code as shown Listing 3 (see the LJ FTP site).

  • Frame 0 is inside of our crash handler—nothing to see here.

  • Frame 1 is also inside of the crash handler.

  • Frame 2 is still inside of our crash handler.

  • Frame 3 shows no source file (it is inside of libc).

  • Frame 4 shows the actual crash (inside of crashC).

  • Frame 5 shows crashB.

  • Frame 6 shows crashA.

  • Frame 7 shows ecrash_test_thread.

  • And, frames 8 and 9 are where the thread gets created in libc.

As you can see, there is a trick to displaying function pointers with gdb. Simply give it an address and dereference it in a list:

(gdb) list *0xWHATEVER

This also works with symbolic_names and offsets:

(gdb) list *main+100

Okay, that was our crashed thread, but what about one of the sleepers? Examine the backtrace from Thread 5, Listing 4 (see the LJ FTP site):

  • Frame 0 is inside of our backtrace handler.

  • Frame 1 still inside of the handler.

  • Frame 2 is in libc.

  • Frame 3 is in libc.

  • Frame 4 is in libc.

  • Frame 5 is inside of sleepFuncC—it is showing the for statement as the program counter, because we are outside of the sleep() function. This is notable because the async signal sent to tell the thread to dump its stack caused sleep() to exit prematurely.

  • Frame 6 shows sleepFuncB.

  • Frame 7 shows sleepFuncA.

  • Frame 8 shows crash_test_thread.

  • Frame 9 is where the thread gets created in libc (or libpthread).

So, this thread is one of the sleeping threads. Not much to see, but in some cases, this thread's information could be vital to discovering the cause of a crash.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

This library is verry

Andrea's picture

This library is verry interesting, but seem that it print only address of main.c.
My program is linked staticaly with a library that contain a thread that call assert().
The program create the thread and I register it in eCrash. I launch the program, it crash and print the stack trace of the offended thread. I have analyzed the address printed with the program add2line but it return only address that are in my main.c. Program and library are compiled witch -g3 and -ggdb flags.
I'll appreciate any help.

uclibc

chengg11's picture

uclibc does not seem to have backtrace support? Any alternative there? thanks!

Where is eCrash?

pcrow's picture

The link in the article for the .tgz file (ftp.ssc.com/pub/lj/issue149/8724.tgz) doesn't work. The Sourceforge project doesn't have anything to download.

Re: Where is eCrash?

Anonymous's picture

ecrash - ftp.ssc.com/pub/lj/issue149/8724.tgz

Anonymous's picture

Page says that page is under construction. No source code download.

8724.tgz

Keith Daniels's picture

It's fixed now, and the correct link is displayed. Sorry for the problem.

Webmaster

All the new OSs and windowing systems are oriented towards content consumption instead of content production.

--Steve Daniels 2013

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix