eCrash: Debugging without Core Dumps
So, what does the crash handler look like? To get a backtrace, the first thing we need to do is grab our signals. Some of the common ones are SIGSEGV, SIGILL and SIGBUS. Additionally, abort() is usually called in the case of an assertion and generates a SIGABRT.
Then, when a signal occurs, we need to save our backtrace. The following snippet details a simple backtrace function that displays the backtrace to standard output when a crash happens:
void signal_handler(int signo)
{
void *stack[20];
int count, i;
// Shouldn't use printf . . . oh well
printf("Caught signal %d\n");
count = backtrace(stack, 20);
for (i=0; i < count; i++) {
printf("Frame %2d: %p\n", i+1, stack[i]);
}
}
int main(...)
{
...
signal(SIGBUS, signal_handler);
signal(SIGILL, signal_handler);
signal(SIGSEGV, signal_handler);
signal(SIGABRT, signal_handler);
}
Caught signal 11
Frame 1: 0x401a84
Frame 2: 0x401d88
...
And, here is a similar signal handler, but one that uses backtrace_symbols to print out a prettier backtrace:
void signal_handler(int signo)
{
void *stack[20];
char **functions;
int count, i;
// Shouldn't use printf . . . oh well
printf("Caught signal %d\n");
count = backtrace(stack, 20);
functions = backtrace_symbols(stack, count);
for (i=0; i < count; i++) {
printf("Frame %2d: %s\n", i, functions[i]);
}
free(functions);
}
Caught signal 11
Frame 1: ./a.out [0x401a84]
Frame 2: ./a.out [0x401bfa]
...
As I was writing this article, I realized that this was about the fifth time I had written a crash handler. (Why can't all software be open source?) So, I decided to write a quick library to handle crash dumps and provide it for this article. I liked the little library I started with, but I found myself needing more and more features. As I kept extending it, I realized that it was a very useful library, that I wanted to be able to leverage on any future project!
I named the new library eCrash and created a SourceForge site for it (see Resources). Since then, I have been extending it, and it now supports dumping multiple threads—using only backtrace, using backtrace and backtrace_symbols, and using backtrace with a user-supplied symbol table to avoid the malloc() inside of backtrace_symbols. The rest of the examples in this article are going to be leveraging eCrash.
eCrash is relatively simple to use. You first call eCrash_Init() from your parent thread. If you have a single-threaded program, you are already finished. A backtrace will be delivered based on your settings in the parameters structure.
If you have a multithreaded program, any thread that wants to be backtraced in a crash (other than the crashing thread) must also call eCrash_RegisterThread(). It is sometimes useful to dump the stacks of all threads when a crash occurs, not only the crashing thread's stack.
With eCrash, you specify where the output should go by setting file descriptors (async safe writes), FILE * streams (not async safe), and/or a filename of the file to output when a crash occurs. eCrash will write to all destinations supplied.
Obtaining the stack from a thread that did not crash is a bit trickier. When a thread registers, it specifies a signal that the thread does not catch or block. eCrash registers a handler for that signal (called the Backtrace Signal).
When eCrash needs to dump a thread (when some other thread has caused an exception), it sends the thread the Backtrace Signal via pthread_kill(). When that signal is caught, the thread saves its backtrace to a global area and continues on. The main exception handler can then read the stack and display it.
What we end up with is a very nice-looking crash dump, showing exactly what was happening in the system when the failure occurred.
Enough talking—time for some meat. Now, let's take what we have discussed and put it to work. We are going to use the ecrash_test program included in eCrash. That program was designed to break in any one of its threads (generate a segmentation violation by attempting to write to a NULL pointer).
We execute the test program with the following flags:
ecrash_test --num_threads=5 --thread_to_crash=3
This causes the test program to generate five threads. All but thread number 3 will call a few functions, then go to sleep. Thread 3 will call a few functions (to make the backtrace interesting) and crash.
The crash file generated is shown in Listing 1 and backtrace_symbols() in Listing 2. Due to space constraints, all listings for this article are available on the Linux Journal FTP site (ftp.linuxjournal.com/pub/lj/listings/issue149/8724.tgz).
The crash file has the backtrace of our offending thread (the one that caused the segmentation violation) and the backtraces of all threads on the system.
Now it's time to debug the crash. We will debug this as if the crash happened at a remote site and the system administrator e-mailed you this crash file.
One last thing: in the real world, the executables always are stripped of debugging information. But, that is okay. As long as you keep a copy of the program with its debugging information, you can ship a stripped copy of the code, and everything will still work!
So, in the lab, you have your crash file and your program with debugging information. Run gdb on the debug version of your program. We know that we have a segmentation violation. So, starting from frame zero of the offending thread, start listing the code as shown Listing 3 (see the LJ FTP site).
Frame 0 is inside of our crash handler—nothing to see here.
Frame 1 is also inside of the crash handler.
Frame 2 is still inside of our crash handler.
Frame 3 shows no source file (it is inside of libc).
Frame 4 shows the actual crash (inside of crashC).
Frame 5 shows crashB.
Frame 6 shows crashA.
Frame 7 shows ecrash_test_thread.
And, frames 8 and 9 are where the thread gets created in libc.
As you can see, there is a trick to displaying function pointers with gdb. Simply give it an address and dereference it in a list:
(gdb) list *0xWHATEVER
This also works with symbolic_names and offsets:
(gdb) list *main+100
Okay, that was our crashed thread, but what about one of the sleepers? Examine the backtrace from Thread 5, Listing 4 (see the LJ FTP site):
Frame 0 is inside of our backtrace handler.
Frame 1 still inside of the handler.
Frame 2 is in libc.
Frame 3 is in libc.
Frame 4 is in libc.
Frame 5 is inside of sleepFuncC—it is showing the for statement as the program counter, because we are outside of the sleep() function. This is notable because the async signal sent to tell the thread to dump its stack caused sleep() to exit prematurely.
Frame 6 shows sleepFuncB.
Frame 7 shows sleepFuncA.
Frame 8 shows crash_test_thread.
Frame 9 is where the thread gets created in libc (or libpthread).
So, this thread is one of the sleeping threads. Not much to see, but in some cases, this thread's information could be vital to discovering the cause of a crash.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
| Speed Up Your Web Site with Varnish | Jun 19, 2013 |
| Non-Linux FOSS: libnotify, OS X Style | Jun 18, 2013 |
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
- Speed Up Your Web Site with Varnish
- Containers—Not Virtual Machines—Are the Future Cloud
- Linux Systems Administrator
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Non-Linux FOSS: libnotify, OS X Style
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- RSS Feeds
- Web & UI Developer (JavaScript & j Query)
- Reply to comment | Linux Journal
2 hours 30 min ago - Yeah, user namespaces are
3 hours 46 min ago - Cari Uang
7 hours 17 min ago - user namespaces
10 hours 11 min ago - yea
10 hours 36 min ago - One advantage with VMs
13 hours 5 min ago - about info
13 hours 38 min ago - info
13 hours 39 min ago - info
13 hours 40 min ago - info
13 hours 42 min ago
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




Comments
This library is verry
This library is verry interesting, but seem that it print only address of main.c.
My program is linked staticaly with a library that contain a thread that call assert().
The program create the thread and I register it in eCrash. I launch the program, it crash and print the stack trace of the offended thread. I have analyzed the address printed with the program add2line but it return only address that are in my main.c. Program and library are compiled witch -g3 and -ggdb flags.
I'll appreciate any help.
uclibc
uclibc does not seem to have backtrace support? Any alternative there? thanks!
Where is eCrash?
The link in the article for the .tgz file (ftp.ssc.com/pub/lj/issue149/8724.tgz) doesn't work. The Sourceforge project doesn't have anything to download.
Re: Where is eCrash?
ftp://ftp.ssc.com/pub/lj/listings/issue149/8724.tgz
ecrash - ftp.ssc.com/pub/lj/issue149/8724.tgz
Page says that page is under construction. No source code download.
8724.tgz
It's fixed now, and the correct link is displayed. Sorry for the problem.
Webmaster
"I have always wished that my computer would be as easy to use as my telephone.
My wish has come true. I no longer know how to use my telephone."
-- Bjarne Stroustrup