Finding Stubborn Bugs with Meaningful Debug Info

When a user reports a bug you can't duplicate, let your application help you find the problem. Add logging now and be a debugging master after the software gets deployed.
Check Input

Make sure the input you are receiving is valid. For instance, if you are expecting something on the command line, check to make sure you have the appropriate number of arguments before trying to use them (or trap the resulting exception). This gives users a better error message. Here's a sample Python program that demonstrates this:

#!/usr/bin/env python
import sys

try:
    print "You supplied: %s" % sys.argv[1]
except IndexError:
    print "You forgot an argument."

Handle Exceptions

Several programming languages, such as Java, Python and OCaml, include support for exceptions. With exceptions, you can catch errors at the place you choose, rather than having to check and handle errors with each call that may produce a problem. Sometimes, it might be correct to let exceptions go unhandled, but usually that is not the case. Exceptions should be caught and handled. Although it may be appropriate to terminate the program if you can't open the file a user asks for, it is still better to do so with an error message giving the filename and problem rather than let the user receive an ugly exception message.

Capture Exceptions

For exceptions that really are fatal to your program, you still may want to capture them. This would allow you, for instance, to log them to a file or display the exception in a pop-up box in the GUI application. This makes it easier for users to send the stack trace back to you. You also can use a generic exception catcher to perform other activities, perhaps output contents of various buffers to help you figure out what was going on at the time.

The following is an example that logs any exceptions along with some information about the program currently running. It then re-raises the exception and exits:

#!/usr/bin/env python
import logging, sys, StringIO, traceback, os

l = logging.getLogger('testlog')

handler = logging.StreamHandler(sys.stderr)
l.addHandler(handler)
formatter = logging.Formatter("LOG: %(message)s")
handler.setFormatter(formatter)

l.setLevel(logging.INFO)

def logexception():
    sbuf = StringIO.StringIO()
    traceback.print_exc(file = sbuf)
    excval = sbuf.getvalue()
    l.critical(" *** Exception Detected ***")
    l.critical("Current PID: %d" % os.getpid())
    l.critical("Program name: %s" % sys.argv[0])
    l.critical("Command line: %s" % \
               str(sys.argv[1:]))
    for line in excval.split("\n"):
        l.critical(line)

def main():
    print "Hello, I'm running."
    raise RuntimeError("Oops! I've had a problem!")

try:
    main()
except:
    logexception()
    raise

When you run this program, you should see something like this on your screen:

Hello, I'm running.
LOG:  *** Exception Detected ***
LOG: Current PID: 28441
LOG: Program name: /tmp/logerror.py
LOG: Command line: []
LOG: Traceback (most recent call last):
LOG:   File "/tmp/logerror.py", line 30, in ?
LOG:     main()
LOG:   File "/tmp/logerror.py", line 27, in main
LOG:     raise RuntimeError("Oops! I've had a problem!")
LOG: RuntimeError: Oops! I've had a problem!
LOG:

Here, the exception handler found the exception, grabbed the information about it and was able to log it. You also can see the traceback a second time. The raise statement at the end of the program causes the exception to be raised and handled in the normal fashion also. This means it aborts your program with a traceback. Depending on your requirements, you may opt to use sys.exit() to terminate instead.

Finding Reported Bugs

Now that you have some ways to help users submit good bug reports, let's look at ways to use those bug reports to track down problems. Armed with a log and perhaps traceback information, here are some questions to ask yourself:

  • Can I duplicate the bug in my environment? If you can duplicate the problem on your own machine, you're a long way toward being able to resolve it easily. Use a debugger or other tool to track it down now that you can trigger it at will.

  • Was the input and output what I expected? Perhaps the user supplied a value you didn't contemplate when you wrote the program. Or, perhaps a network client or server treats a protocol slightly differently from what you expected. Maybe the input or output is itself malformed, and the bug isn't even in your program. A debug log showing all I/O can be very helpful here.

  • Was the program flow as expected? If your log calls to various functions or methods, you should be able to trace the flow of execution in a program. Perhaps certain conditions cause vital code to be skipped, leading to trouble later on.

  • Where was the last point of correct execution? This may have been right before the error, or perhaps incorrect data was passed around for some time prior to a crash. Pinpointing the most recent time in the program's history where it was functioning normally can help track down the precise place where things went awry.

  • If a traceback is on-hand, does the stack look normal? Check to make sure the function calls are as expected and that the data passed to them looks legitimate.

______________________

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState