eCrash: Debugging without Core Dumps
Embedded Linux does a good job of bridging the gap between embedded programming and high-level UNIX programming. It has a full TCP stack, good debugging tools and great library support. This makes for a feature-rich development environment. There is one downside, however. How can you debug problems that occur out in the field?
On full-featured operating systems, it's easy to use a core dump to debug a problem that occurs in the field.
On non-embedded UNIX systems, when a program encounters an exception, it outputs all of its current state to a file on the filesystem. This file is usually called core. This core file contains all the memory the program was using at the time the failure occurred. This allows for post-mortem investigation to diagnose the exception.
Typically, on embedded Linux systems, there is no (or very little) persistent disk storage. On all of the systems on which I have worked, there is more RAM than persistent storage. So, getting a core dump is impossible. This article describes some alternatives to core dumps that will allow you to perform post-mortem debugging.
Programs can fail for reasons other than exceptions. Programs can deadlock, or they can have run-away threads that use up all system resources (memory, CPU or other fixed resources). It also would be beneficial to generate some kind of persistent crash file under these situations.
So, first we need to come up with the information we want to save. Because of memory constraints, saving all of the process' memory is not an option. If it were, you simply could use core dumps! But, there is other very useful information we can save. At the top of the list is the backtrace of the failed thread.
A backtrace is a list of the functions that were called to get to the current position in the program. Even with the absence of system memory and data, a backtrace can shed light onto what was happening at the time of failure.
Many embedded systems also have logs: lists of errors, warnings and metrics to let you know what happened. Having a post-mortem dump of the last few logs before failure is an invaluable asset in finding the root cause of a failure.
In complex, multithreaded systems, you usually have many mutexes. It could be useful, in the case of a deadlock, to show the state of all the processes' mutexes and semaphores.
Showing memory usage statistics also could help diagnose the problem.
Once we have determined the information we want to save, we still need to come up with where to save it. This will vary greatly from system to system. If your system has no persistent storage at all, perhaps you can output the crash information to a serial terminal or display it on an LCD readout. (We have serious space constraints there!) If your system has CompactFlash, you can save it to a filesystem. Or, if it has raw Flash (an MTD device), you can either save it to a jffs2 filesystem, or maybe to a raw sector or two.
If the crash was not too severe, perhaps the crash could be uploaded to a tftp server or sent to a remote syslog facility.
Now that we have a firm grasp on what we want to save, and locations to which we can save it, let's talk about how we are going to do it!
In general, getting a backtrace is not as simple as it sounds. Accessing system registers (like the stack pointer) varies from architecture to architecture. Thankfully, the FSF comes to our rescue in GNU's C Standard Library (see the on-line Resources). Libc has three functions that will aid us in retrieving backtraces: backtrace(), backtrace_symbols() and backtrace_symbols_fd().
The backtrace() function populates an array of pointers with a backtrace of the current thread. This, in general, is enough information for debugging, but it is not very pretty.
The backtrace_symbols() function takes the information populated by backtrace() and returns symbolic names (function names). The only problem with backtrace_symbols is that it is not async-signal safe. backtrace_symbols() uses malloc(). Because malloc() uses spinlocks, it is not safe to be called from a signal handler (it could cause a deadlock).
The backtrace_symbols_fd() function attempts to solve the signal issues associated with malloc and output the symbolic information directly to a file descriptor.
Some functions inside of libc rely on signals themselves: some IO operations, memory allocation and so on. So, we are very limited in what we should do inside of a handler. In our case, we can cheat a little. Because our program already is crashing, a deadlock is not that big of a concern. The code in my examples makes use of several not-allowed functions, such as fwrite(), printf() and sprintf(). But, we can work to avoid some of the functions that are prone to deadlock, such as malloc() and backtrace_symbols().
In my opinion, the biggest loss we have is the loss of backtrace_symbols. But, here is where things get easier. You always can implement your own symbol table and look up the functions from the pointers themselves.
In my examples, I sometimes use backtrace_symbols(). I have not seen a deadlock yet, but it is possible.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
- RSS Feeds
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- Dynamic DNS—an Object Lesson in Problem Solving
- New Products
- Validate an E-Mail Address with PHP, the Right Way
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Download the Free Red Hat White Paper "Using an Open Source Framework to Catch the Bad Guy"
- Tech Tip: Really Simple HTTP Server with Python
- Keeping track of IP address
1 hour 20 min ago - Roll your own dynamic dns
6 hours 33 min ago - Please correct the URL for Salt Stack's web site
9 hours 45 min ago - Android is Linux -- why no better inter-operation
12 hours 42 sec ago - Connecting Android device to desktop Linux via USB
12 hours 29 min ago - Find new cell phone and tablet pc
13 hours 27 min ago - Epistle
14 hours 56 min ago - Automatically updating Guest Additions
16 hours 4 min ago - I like your topic on android
16 hours 51 min ago - This is the easiest tutorial
23 hours 26 min ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




Comments
This library is verry
This library is verry interesting, but seem that it print only address of main.c.
My program is linked staticaly with a library that contain a thread that call assert().
The program create the thread and I register it in eCrash. I launch the program, it crash and print the stack trace of the offended thread. I have analyzed the address printed with the program add2line but it return only address that are in my main.c. Program and library are compiled witch -g3 and -ggdb flags.
I'll appreciate any help.
uclibc
uclibc does not seem to have backtrace support? Any alternative there? thanks!
Where is eCrash?
The link in the article for the .tgz file (ftp.ssc.com/pub/lj/issue149/8724.tgz) doesn't work. The Sourceforge project doesn't have anything to download.
Re: Where is eCrash?
ftp://ftp.ssc.com/pub/lj/listings/issue149/8724.tgz
ecrash - ftp.ssc.com/pub/lj/issue149/8724.tgz
Page says that page is under construction. No source code download.
8724.tgz
It's fixed now, and the correct link is displayed. Sorry for the problem.
Webmaster
"I have always wished that my computer would be as easy to use as my telephone.
My wish has come true. I no longer know how to use my telephone."
-- Bjarne Stroustrup