eCrash: Debugging without Core Dumps
So, what does the crash handler look like? To get a backtrace, the first thing we need to do is grab our signals. Some of the common ones are SIGSEGV, SIGILL and SIGBUS. Additionally, abort() is usually called in the case of an assertion and generates a SIGABRT.
Then, when a signal occurs, we need to save our backtrace. The following snippet details a simple backtrace function that displays the backtrace to standard output when a crash happens:
void signal_handler(int signo)
{
void *stack[20];
int count, i;
// Shouldn't use printf . . . oh well
printf("Caught signal %d\n");
count = backtrace(stack, 20);
for (i=0; i < count; i++) {
printf("Frame %2d: %p\n", i+1, stack[i]);
}
}
int main(...)
{
...
signal(SIGBUS, signal_handler);
signal(SIGILL, signal_handler);
signal(SIGSEGV, signal_handler);
signal(SIGABRT, signal_handler);
}
Caught signal 11
Frame 1: 0x401a84
Frame 2: 0x401d88
...
And, here is a similar signal handler, but one that uses backtrace_symbols to print out a prettier backtrace:
void signal_handler(int signo)
{
void *stack[20];
char **functions;
int count, i;
// Shouldn't use printf . . . oh well
printf("Caught signal %d\n");
count = backtrace(stack, 20);
functions = backtrace_symbols(stack, count);
for (i=0; i < count; i++) {
printf("Frame %2d: %s\n", i, functions[i]);
}
free(functions);
}
Caught signal 11
Frame 1: ./a.out [0x401a84]
Frame 2: ./a.out [0x401bfa]
...
As I was writing this article, I realized that this was about the fifth time I had written a crash handler. (Why can't all software be open source?) So, I decided to write a quick library to handle crash dumps and provide it for this article. I liked the little library I started with, but I found myself needing more and more features. As I kept extending it, I realized that it was a very useful library, that I wanted to be able to leverage on any future project!
I named the new library eCrash and created a SourceForge site for it (see Resources). Since then, I have been extending it, and it now supports dumping multiple threads—using only backtrace, using backtrace and backtrace_symbols, and using backtrace with a user-supplied symbol table to avoid the malloc() inside of backtrace_symbols. The rest of the examples in this article are going to be leveraging eCrash.
eCrash is relatively simple to use. You first call eCrash_Init() from your parent thread. If you have a single-threaded program, you are already finished. A backtrace will be delivered based on your settings in the parameters structure.
If you have a multithreaded program, any thread that wants to be backtraced in a crash (other than the crashing thread) must also call eCrash_RegisterThread(). It is sometimes useful to dump the stacks of all threads when a crash occurs, not only the crashing thread's stack.
With eCrash, you specify where the output should go by setting file descriptors (async safe writes), FILE * streams (not async safe), and/or a filename of the file to output when a crash occurs. eCrash will write to all destinations supplied.
Obtaining the stack from a thread that did not crash is a bit trickier. When a thread registers, it specifies a signal that the thread does not catch or block. eCrash registers a handler for that signal (called the Backtrace Signal).
When eCrash needs to dump a thread (when some other thread has caused an exception), it sends the thread the Backtrace Signal via pthread_kill(). When that signal is caught, the thread saves its backtrace to a global area and continues on. The main exception handler can then read the stack and display it.
What we end up with is a very nice-looking crash dump, showing exactly what was happening in the system when the failure occurred.
Enough talking—time for some meat. Now, let's take what we have discussed and put it to work. We are going to use the ecrash_test program included in eCrash. That program was designed to break in any one of its threads (generate a segmentation violation by attempting to write to a NULL pointer).
We execute the test program with the following flags:
ecrash_test --num_threads=5 --thread_to_crash=3
This causes the test program to generate five threads. All but thread number 3 will call a few functions, then go to sleep. Thread 3 will call a few functions (to make the backtrace interesting) and crash.
The crash file generated is shown in Listing 1 and backtrace_symbols() in Listing 2. Due to space constraints, all listings for this article are available on the Linux Journal FTP site (ftp.linuxjournal.com/pub/lj/listings/issue149/8724.tgz).
The crash file has the backtrace of our offending thread (the one that caused the segmentation violation) and the backtraces of all threads on the system.
Now it's time to debug the crash. We will debug this as if the crash happened at a remote site and the system administrator e-mailed you this crash file.
One last thing: in the real world, the executables always are stripped of debugging information. But, that is okay. As long as you keep a copy of the program with its debugging information, you can ship a stripped copy of the code, and everything will still work!
So, in the lab, you have your crash file and your program with debugging information. Run gdb on the debug version of your program. We know that we have a segmentation violation. So, starting from frame zero of the offending thread, start listing the code as shown Listing 3 (see the LJ FTP site).
Frame 0 is inside of our crash handler—nothing to see here.
Frame 1 is also inside of the crash handler.
Frame 2 is still inside of our crash handler.
Frame 3 shows no source file (it is inside of libc).
Frame 4 shows the actual crash (inside of crashC).
Frame 5 shows crashB.
Frame 6 shows crashA.
Frame 7 shows ecrash_test_thread.
And, frames 8 and 9 are where the thread gets created in libc.
As you can see, there is a trick to displaying function pointers with gdb. Simply give it an address and dereference it in a list:
(gdb) list *0xWHATEVER
This also works with symbolic_names and offsets:
(gdb) list *main+100
Okay, that was our crashed thread, but what about one of the sleepers? Examine the backtrace from Thread 5, Listing 4 (see the LJ FTP site):
Frame 0 is inside of our backtrace handler.
Frame 1 still inside of the handler.
Frame 2 is in libc.
Frame 3 is in libc.
Frame 4 is in libc.
Frame 5 is inside of sleepFuncC—it is showing the for statement as the program counter, because we are outside of the sleep() function. This is notable because the async signal sent to tell the thread to dump its stack caused sleep() to exit prematurely.
Frame 6 shows sleepFuncB.
Frame 7 shows sleepFuncA.
Frame 8 shows crash_test_thread.
Frame 9 is where the thread gets created in libc (or libpthread).
So, this thread is one of the sleeping threads. Not much to see, but in some cases, this thread's information could be vital to discovering the cause of a crash.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- RSS Feeds
- New Products
- Using Salt Stack and Vagrant for Drupal Development
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- Validate an E-Mail Address with PHP, the Right Way
- New Products
- Readers' Choice Awards
- Ahh, the Koolaid.
5 hours 19 min ago - git-annex assistant
11 hours 19 min ago - direct cable connection
11 hours 41 min ago - Agreed on AirDroid. With my
11 hours 52 min ago - I just learned this
11 hours 56 min ago - enterprise
12 hours 26 min ago - not living upto the mobile revolution
15 hours 17 min ago - Deceptive Advertising and
15 hours 53 min ago - Let\'s declare that you have
15 hours 54 min ago - Alterations in Contest Due
15 hours 55 min ago
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.




Comments
This library is verry
This library is verry interesting, but seem that it print only address of main.c.
My program is linked staticaly with a library that contain a thread that call assert().
The program create the thread and I register it in eCrash. I launch the program, it crash and print the stack trace of the offended thread. I have analyzed the address printed with the program add2line but it return only address that are in my main.c. Program and library are compiled witch -g3 and -ggdb flags.
I'll appreciate any help.
uclibc
uclibc does not seem to have backtrace support? Any alternative there? thanks!
Where is eCrash?
The link in the article for the .tgz file (ftp.ssc.com/pub/lj/issue149/8724.tgz) doesn't work. The Sourceforge project doesn't have anything to download.
Re: Where is eCrash?
ftp://ftp.ssc.com/pub/lj/listings/issue149/8724.tgz
ecrash - ftp.ssc.com/pub/lj/issue149/8724.tgz
Page says that page is under construction. No source code download.
8724.tgz
It's fixed now, and the correct link is displayed. Sorry for the problem.
Webmaster
"I have always wished that my computer would be as easy to use as my telephone.
My wish has come true. I no longer know how to use my telephone."
-- Bjarne Stroustrup