eCrash: Debugging without Core Dumps

 in
How to use backtrace and a custom library to debug your embedded applications.
Other Useful Crash Information

Now that we have clobbered to death what a backtrace is, how to produce one, the different methods of displaying one and how to debug a crash with one, it's time to change gears. A crash file can include a lot more information:

  • States of mutexes (who is holding the locks—useful for deadlock diagnosis).

  • Current error logs.

  • Program statistics.

  • Memory usage.

  • Most recent network packets.

Some of the above items could be useful information for post-mortem debugging. There is one caveat, however. Because we have encountered an exception, something has gone terribly wrong. Our data structures could be corrupt. We could be low on (or out of) memory.

Also, some threads could be deadlocked waiting on mutexes that our crashed thread was holding.

Because some of the data we want to display might generate another exception (if it is corrupted), we want to display the most important information first, then display more and more unsafe information. Also, to prevent information loss, buffers always should be flushed on FILE* streams.

Conclusion

Diagnosing a problem on a deployed embedded system can be a difficult task. But, choosing the right data to save or display in the case of an exception can make the task much easier.

With a relatively small amount of storage, or a remote server, you can save enough post-mortem information to be able to find a failure in your system.

Resources for this article: /article/9139.

David Frascone (dave@frascone.com) works for Cisco Systems, Inc., in the Wireless Business Unit. He is currently working on Next Generation controller design.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

This library is verry

Andrea's picture

This library is verry interesting, but seem that it print only address of main.c.
My program is linked staticaly with a library that contain a thread that call assert().
The program create the thread and I register it in eCrash. I launch the program, it crash and print the stack trace of the offended thread. I have analyzed the address printed with the program add2line but it return only address that are in my main.c. Program and library are compiled witch -g3 and -ggdb flags.
I'll appreciate any help.

uclibc

chengg11's picture

uclibc does not seem to have backtrace support? Any alternative there? thanks!

Where is eCrash?

pcrow's picture

The link in the article for the .tgz file (ftp.ssc.com/pub/lj/issue149/8724.tgz) doesn't work. The Sourceforge project doesn't have anything to download.

Re: Where is eCrash?

Anonymous's picture

ecrash - ftp.ssc.com/pub/lj/issue149/8724.tgz

Anonymous's picture

Page says that page is under construction. No source code download.

8724.tgz

Keith Daniels's picture

It's fixed now, and the correct link is displayed. Sorry for the problem.

Webmaster

All the new OSs and windowing systems are oriented towards content consumption instead of content production.

--Steve Daniels 2013

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix