Debugging Kernel Modules with User Mode Linux

Programming in kernel space has always been left to the gurus. Few people have the courage, knowledge and patience to work in the realm of interrupts, devices and the always painful kernel panic.

When you write programs in user space, the worst thing that can happen to your program is a core dump. Your program did something very wrong, so the operating system decided to give you all of its memory and state information back to you in the form of a core file. Core files can then be used to debug your program and fix the problem.

When you program in the kernel, there is no operating system to step in and safely stop your code from running and tell you that you have a problem. The Linux kernel is pretty nice to its own code. Sometimes it can survive a panic, if you are doing something wrong that is relatively benign (these panics are typically called oopses). But, there is nothing to stop your code from overwriting or accessing memory locations from anywhere in the kernel's address space. Also, if your module hangs, the kernel hangs (technically, your current kernel thread hangs, but the result is usually the same).

These problems may sound benign to the naïve, but they are serious issues. If the kernel panics, you rarely know exactly what caused the panic. The typical solution is to put printks everywhere and hope that you stumble across the problem before the messages are lost to the reboot. All of this is assuming that you do not corrupt your filesystem. I have lost an entire filesystem before due to a poorly timed panic (and due to the fact that a badly initialized pointer was overwriting some of ext2's internal structures).

The first thing you learn when kernel programming is to keep all your code on NFS. Files remain safe on another machine. But, that does not save you the time of having e2fsck run every time you panic. Plus, you still can lose your filesystem, even if your source code is safe on another machine.

So, with all of these issues, it is not surprising how few have entered the realm of kernel programming. Now, all that can change.

Virtual Machines and UML

Back in the mainframe days, when timesharing machines were the norm, the idea of a virtual machine was born. A virtual machine is an encapsulated computer completely at your disposal. A program on a virtual machine has no real access to the physical hardware. All hardware access is controlled by the machine or emulator.

VMware (www.vmware.com) has a very powerful virtual machine that allows you to run any x86-based operating system under Windows NT, 2000, XP or Linux. SoftPC (an 8086 emulator allowing you to run Windows and DOS programs) has been available on Motorola 68k-based computers (i.e., the Macintosh) since 1988.

True virtual machines are sometimes too expensive for the learner's budget. (VMware Workstation for Linux costs $299 US from their web site.) Thankfully, there is now a free alternative for those only wanting to run Linux: User-Mode Linux (UML).

User-Mode Linux (user-mode-linux.sourceforge.net) is not a complete virtual machine. It does not emulate different hardware or give you the ability to run other operating systems. But, it does allow you to run a kernel in user space. This gives you several benefits when it comes to development: the host filesystem is safe from corruption, the virtual filesystem is undoable (which makes it safe from corruption), you can run multiple machines on one machine (this is useful for testing intermachine communication, i.e., network messages, without having to use multiple machines) and it is very easy to run the kernel in a debugger.

Setting up UML

Running UML is easy. You can download one of the binary packages (kernel binaries, plus a couple of tools), or you can download the kernel patch. You also need to download a filesystem. I'd recommend playing with the binaries first, then building a custom kernel to suit your needs. The HOWTO covers all of these topics and more.

One useful benefit of UML is Copy-on-Write files. These files allow you to modify a virtual filesystem, without modifying the base filesystem. All writes or modifications to a filesystem are stored in these files, typically ending with the extension .cow.

So, when you are working, and you panic the filesystem, all you do is remove the .cow file (which will be recreated), and your corrupted filesystem is restored to its pristine version. (There are also tools to incorporate the changes in a .cow file back into the original filesystem, if you want to keep your changes.)

Debugging Modules

Once you have UML up and running, it's time to play. I've written a very simple kernel module for testing. It uses four devices, /dev/gentest[0-3]. The module treats each device a little differently. Device 1 is a sink (just like /dev/null). Device 2 stores a string for later retrieval. You can read the status of the module from device 3, and device 0 could be any of the other three devices, depending on how it is configured. (You can change the configuration with ioctl calls.) The kernel module is available from www.frascone.com/kHacking/gentest-0.1.tar.gz.

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix