Debugging Embedded Linux Platforms with GDB and Python

Give your debugging sessions go-faster stripes with the power of Python.
Initial Debugging

Now that we have GDB cross-compiled and installed, let's take a look at debugging the deadlock on the embedded target.

First, run gdbserver on the target and attach to the deadlocked process:


gdbserver :5555 --attach <pid of process>

Now, fire up GDB on the host PC:

mipsel-linux-uclibc-gdb

Once GDB is running, point it at the target's root filesystem and at the file to debug:

(gdb) set solib-absolute-prefix /export/shared/rootfs
(gdb) file hello_world
(gdb)

Finally, tell GDB to attach to the process running on the target via gdbserver:

(gdb) target remote 10.0.0.6:5555
(gdb)

If all goes well, now you should be able to explore the running process a little to see what is going on. Given that the process has deadlocked, examining the state of the threads in the process is a good first port of call:

(gdb) info threads
Id  Target Id   Frame
33  Thread 737  0x2aac1068 in __lll_lock_wait from libpthread.so.0
32  Thread 738  0x2aac1068 in __lll_lock_wait from libpthread.so.0
31  Thread 739  0x2aac1068 in __lll_lock_wait from libpthread.so.0
....
3   Thread 767  0x2aac1068 in __lll_lock_wait from libpthread.so.0
2   Thread 768  0x2aac1068 in __lll_lock_wait from libpthread.so.0
1   Thread 736  0x2aab953c in pthread_join from libpthread.so.0
(gdb)

The omitted threads in the GDB output are all similarly blocking in __lll_lock_wait(), somewhere in the depths of libpthread.so. Clearly, some of these threads must be waiting for a mutex that another thread has not given up—but which threads, and which mutex?

Some examination of the libpthread source in the uClibc tree shows us that __lll_lock_wait() is a low-level wrapper around the Linux futex system call. The prototype for this function is:

void __lll_lock_wait (int *futex, int private);

On MIPS, the a0 register typically is used for the first argument to a function. So if we examine a0 for each thread that is blocked in __lll_lock_wait(), we should get a good idea of which threads are waiting on which mutexes. That's a good start, but ideally we want to find out which thread currently owns each mutex. How can we manage that?

Going back to the uClibc sources, we can see that __lll_lock_wait() is called from pthread_mutex_lock(). The integer pointer provided to __lll_lock_wait() is actually a pointer to the pthread_mutex_t structure:

typedef union
{
  struct __pthread_mutex_s
  {
    int __lock;
    unsigned int __count;
    int __owner;
    int __kind;
    unsigned int __nusers;
    __extension__ union
    {
      int __spins;
      __pthread_slist_t __list;
    };
  } __data;
  char __size[__SIZEOF_PTHREAD_MUTEX_T];
  long int __align;
} pthread_mutex_t;

The __owner field looks interesting, and on further investigation it seems that __owner is set to the thread ID (TID) of the thread that is currently holding the mutex.

By combining these two pieces of information (namely the mutex pointer provided to __lll_lock_wait(); and the __owner field two integers on in that structure), we should be able to find out which threads are blocking on which mutexes.

The trouble is that this would be very tedious to iterate through by hand. Each thread that is blocking in __lll_lock_wait() will need to be selected. Then the contents of register a0 must be queried for the appropriate stack frame of each thread, and the memory at the location pointed to by a0 examined to discover which thread owns the mutex that the thread is waiting for. Even for this trivial program, we have some 32 threads to look at, which is a lot of manual work.

Putting Python into Practice

Rather than driving the debugger by hand, let's instead look at how we can automate the task described above using the GDB Python API. First, we need to be able to iterate over each thread in the process under debugging (the “inferior”, in GDB terminology). To do this, we can use the threads() method of the gdb.Inferior class:

for process in gdb.inferiors():
    for thread in process.threads():
        print thread

That was easy. Now we need to look at the currently executing stack frame for each thread and figure out whether it is waiting on a mutex. We can do this using the gdb module function selected_frame() and the name() method of the gdb.Frame class:

for process in gdb.inferiors():
    for thread in process.threads():
        thread.switch()
        frame = gdb.selected_frame()
        if frame.name() == "__lll_lock_wait":
            print "Thread is blocking in __lll_lock_wait"

So far, so good. Now that we have a method for programmatically finding each thread that is waiting on a mutex, we need to examine the contents of the a0 register for each of those threads. This should extract a pointer to the mutex structure that the thread is waiting on. Happily, GDB provides a convenience variable, $a0, which we can use to access the a0 register. The gdb module function parse_and_eval() provides API access to convenience variables, amongst other things:

for process in gdb.inferiors():
    for thread in process.threads():
        thread.switch()
        frame = gdb.selected_frame()
        if frame.name() == "__lll_lock_wait":
            print "Thread is blocking in __lll_lock_wait"
            a0 = gdb.parse_and_eval("$a0")

The last piece of information we need to extract from GDB is the contents of memory at the pointer in the a0 register so that we can determine the __owner field for each mutex in play. Although it's probably cheating to do so, we can fall back on the gdb module function execute() to pass the x command to the GDB command-line interface. This will print the contents of memory to a string that we can parse to find the required information:

for process in gdb.inferiors():
    for thread in process.threads():
        thread.switch()
        frame = gdb.selected_frame()
        if frame.name() == "__lll_lock_wait":
            print "Thread is blocking in __lll_lock_wait"
            a0 = gdb.parse_and_eval("$a0")
            s = gdb.execute("x/4d $a0", to_string=True).split()
            s.reverse()
            owner = int(s[1])

It's not particularly pretty to look at, but it works. This code splits the string returned from the x command into a whitespace-delimited list. Because GDB may alter the labels used at the start of the output depending on what symbolic information it can extract from the application binary, we then reverse the list and pull out the second-to-last value. This yields the third integer value in the structure, which in this case is the __owner field of pthread_mutex_t.

All that remains to do now is to plug all of these pieces of data together to provide some useful information. Listing 2 shows the full Python code to do this. Putting it all together:

(gdb) source mutex_check.py 
Process threads : 
Id  Target Id   Frame 
33  Thread 737  0x2aac1068 in __lll_lock_wait from libpthread.so.0
32  Thread 738  0x2aac1068 in __lll_lock_wait from libpthread.so.0
....
3   Thread 767  0x2aac1068 in __lll_lock_wait from libpthread.so.0
2   Thread 768  0x2aac1068 in __lll_lock_wait from libpthread.so.0
1   Thread 736  0x2aab953c in pthread_join from libpthread.so.0
Analysing mutexes...
  Mutex 0x401cf0 :
     -> held by thread : 740
     -> blocks threads : 737 738 739 741 742 743 744 745 746 747
                         748 749 750 751 752 753 754 755 756 757
                         758 759 760 761 762 763 764 765 766 767
                         768
  Mutex 0x401d08 :
     -> held by thread : 768
     -> blocks threads : 740
(gdb) 

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix