Finding Stubborn Bugs with Meaningful Debug Info
Make sure the input you are receiving is valid. For instance, if you are expecting something on the command line, check to make sure you have the appropriate number of arguments before trying to use them (or trap the resulting exception). This gives users a better error message. Here's a sample Python program that demonstrates this:
#!/usr/bin/env python import sys try: print "You supplied: %s" % sys.argv except IndexError: print "You forgot an argument."
Several programming languages, such as Java, Python and OCaml, include support for exceptions. With exceptions, you can catch errors at the place you choose, rather than having to check and handle errors with each call that may produce a problem. Sometimes, it might be correct to let exceptions go unhandled, but usually that is not the case. Exceptions should be caught and handled. Although it may be appropriate to terminate the program if you can't open the file a user asks for, it is still better to do so with an error message giving the filename and problem rather than let the user receive an ugly exception message.
For exceptions that really are fatal to your program, you still may want to capture them. This would allow you, for instance, to log them to a file or display the exception in a pop-up box in the GUI application. This makes it easier for users to send the stack trace back to you. You also can use a generic exception catcher to perform other activities, perhaps output contents of various buffers to help you figure out what was going on at the time.
The following is an example that logs any exceptions along with some information about the program currently running. It then re-raises the exception and exits:
#!/usr/bin/env python import logging, sys, StringIO, traceback, os l = logging.getLogger('testlog') handler = logging.StreamHandler(sys.stderr) l.addHandler(handler) formatter = logging.Formatter("LOG: %(message)s") handler.setFormatter(formatter) l.setLevel(logging.INFO) def logexception(): sbuf = StringIO.StringIO() traceback.print_exc(file = sbuf) excval = sbuf.getvalue() l.critical(" *** Exception Detected ***") l.critical("Current PID: %d" % os.getpid()) l.critical("Program name: %s" % sys.argv) l.critical("Command line: %s" % \ str(sys.argv[1:])) for line in excval.split("\n"): l.critical(line) def main(): print "Hello, I'm running." raise RuntimeError("Oops! I've had a problem!") try: main() except: logexception() raise
When you run this program, you should see something like this on your screen:
Hello, I'm running. LOG: *** Exception Detected *** LOG: Current PID: 28441 LOG: Program name: /tmp/logerror.py LOG: Command line:  LOG: Traceback (most recent call last): LOG: File "/tmp/logerror.py", line 30, in ? LOG: main() LOG: File "/tmp/logerror.py", line 27, in main LOG: raise RuntimeError("Oops! I've had a problem!") LOG: RuntimeError: Oops! I've had a problem! LOG:
Here, the exception handler found the exception, grabbed the information about it and was able to log it. You also can see the traceback a second time. The raise statement at the end of the program causes the exception to be raised and handled in the normal fashion also. This means it aborts your program with a traceback. Depending on your requirements, you may opt to use sys.exit() to terminate instead.
Now that you have some ways to help users submit good bug reports, let's look at ways to use those bug reports to track down problems. Armed with a log and perhaps traceback information, here are some questions to ask yourself:
Can I duplicate the bug in my environment? If you can duplicate the problem on your own machine, you're a long way toward being able to resolve it easily. Use a debugger or other tool to track it down now that you can trigger it at will.
Was the input and output what I expected? Perhaps the user supplied a value you didn't contemplate when you wrote the program. Or, perhaps a network client or server treats a protocol slightly differently from what you expected. Maybe the input or output is itself malformed, and the bug isn't even in your program. A debug log showing all I/O can be very helpful here.
Was the program flow as expected? If your log calls to various functions or methods, you should be able to trace the flow of execution in a program. Perhaps certain conditions cause vital code to be skipped, leading to trouble later on.
Where was the last point of correct execution? This may have been right before the error, or perhaps incorrect data was passed around for some time prior to a crash. Pinpointing the most recent time in the program's history where it was functioning normally can help track down the precise place where things went awry.
If a traceback is on-hand, does the stack look normal? Check to make sure the function calls are as expected and that the data passed to them looks legitimate.
Free DevOps eBooks, Videos, and more!
Regardless of where you are in your DevOps process, Linux Journal can help!
We offer here the DEFINITIVE DevOps for Dummies, a mobile Application Development Primer, and advice & help from the expert sources like:
- Linux Journal
- Resurrecting the Armadillo
- High-Availability Storage with HA-LVM
- Real-Time Rogue Wireless Access Point Detection with the Raspberry Pi
- Localhost DNS Cache
- DNSMasq, the Pint-Sized Super Dæmon!
- March 2015 Issue of Linux Journal: System Administration
- Days Between Dates: the Counting
- The Usability of GNOME
- Linux for Astronomers
- You're the Boss with UBOS