Dedication Trumps Talent

by Larry Colen

There are many sources of information about the Linux kernel, but not many on how to learn about it. Someone who wants to learn about the Linux kernel will find themselves faced with a huge array of incomplete and slightly outdated HOWTOs, tutorials and books. Combine this with the fact that most of the comments in the code seem to be optimized for people who already know their way around can make the task of coming up to speed on the Linux kernel a daunting one.

The first thing one usually does when trying to learn about a piece of software is to check out the documentation available. There are three types of software documentation: tutorial, mnemonic and misleading. Tutorial explains the software to someone new, mnemonic reminds someone who already knew it and misleading is just wrong, usually ignored by those who know.

There are four basic ways that software can be documented: function or file headers, inline comments, function and variable names, and external documents.

It is my experience that most of the documentation directly associated with the code is more mnemonic than tutorial. A cynical individual might suspect that this was done deliberately to force newcomers to delve more deeply into the code before making any modifications. I often found that it would require a day or two of research to find which three lines of code I needed to change, and several more days of checking and cross checking to make sure. Most of the tutorial documentation was external to the code.

There are also several books on the Linux kernel. In many ways it is fortunate that the Linux kernel evolves so quickly as new concepts and techniques are incorporated into it, but this has the unfortunate effect that most books on the kernel are at least one stable release out of date. If I were to comment on the various specific books, these comments would themselves quickly become out of date. I will say that I have found it very useful to buy several books, as they each have their own strengths and weaknesses, and while portions of the kernel are always changing, most of it remains fairly stable from release to release.

I have also noticed that many variables were given their names for historical reasons. Over the course of years the uses of these variables slowly changed until the name of the variable actually has little, if anything to do with how it is really used. This does not happen a lot, but it does happen. You should always double check that a variable or function really does do what you think that it does.

The on-line support for Linux is truly outstanding. Some documentation on just about any part of it can be found on-line. There are also many forums for discussing it. The premier forum for discussing the kernel is arguably the Linux Kernel Mailing List (LKML). The LKML has the advantage that just about everyone who is anyone in the Linux kernel development community reads it and posts to it. It has the disadvantage that just about everyone who is anyone in the Linux kernel development community reads it and posts to it. To use an old netism, it is like trying to sip from a firehose. There are a lot of very informative posts, very enlightening discussion, quite a few clueless questions by people who obviously have not read what documentation there is, and a lot of out and out flameage on it. Many people find it far more worthwhile to read the weekly summary on the kernel traffic web site.

Reading books, articles, HOWTOs and the development discussion are all very well and good, but the only way to learn the kernel is to "use the source Luke". It works best if you have some particular goal in mind, for example a system call that would make it impossible for any new processes to gain root privileges. Trying to learn everything about the kernel is overwhelming, but concentrating on one thing at a time makes it much easier. Not easy mind you, just easier.

Often, the best documentation on concepts used in the kernel, such as slabs, does not come from the Linux community, but from the originators of the concept. Do not hesitate to do net searches on the various concepts described or mentioned.

One of the first things that I did was to write a script that would grep every source file in the source tree for a particular string. This would make it a lot easier to find where things were taking place. I found it helpful to delete alternate architectures from my development copy of the source tree, to reduce the number of hits on any string.

My brutally simple script to do this looked like:

#!/bin/sh
# grep through the linux source for the requested string
find /usr/src/linux -name *.[chS] -exec grep --with-filename $1 $2 $3
$4 $5 {} ;
#end script

This script is evidence that not only am I not a shell programmer, I don't even play one on TV. If anyone has better scripts for doing this, I'd love to see them.

Along with this is the venerable tag function found in vi, emacs and many other text editors. A tags file is basically a database of the location of every function declaration (and often macro and variable declarations as well) in each of the indexed files. When you run across a reference to a function that you don't recognize, tag can have your editor jump to that function so that you can see for yourself what it does.

In conjunction with this script, I highly recommend adding your own comments to your development/learning version of the source. Mark each line or block of your comments with some easily grepped tag such as:

 /*
LRC NOTE */. 

I will also often use strings such as !!! to denote lines that I should look at or ??? to denote lines that puzzle me, or that I think may be in error.

I have found that going through and annotating code is one of the best ways of learning it. One of my favorite techniques is to add function headers that resemble man pages, and to fill in all the pertinent information about each function. My philosophy on code documentation is that function headers should be tutorial and in-line comments should be mnemonic. Here is an example of such a function header:

 Name:          A plain-English version of the name.
 Synopsis:      A one or two line explanation of the
                function.
 Input:         What it requires (parameters and
                globals) as input.
 Output:        What is returned, or changed, by the function.
 Description:   A detailed description of the
                function, including explanations of
                algorithms, intent and any tricks.
 Testing:       Things to keep in mind while testing the program.
                Special cases to test for.
 Warnings:      Anything that might be dangerous, or
                a source of bugs.
 Updates:       Note who modified the function, when,
                why and how.

It starts and ends with strings that make it easy to write an automated script to extract these headers and create a manual of the code base.

I also found it extremely valuable to keep a notebook. There is some advantage to just writing all of your notes down into a file, which can later be searched electronically. I prefer a bound notebook. Most people just scribble their notes and questions down on random scraps of paper which then either get lost or thrown away. I find that keeping them in chronological order is very useful, because when I forget the answer I can go back and track down what I had learned earlier. It is also very helpful to when you find the answer to a question you wrote down, write the answer down next to the question. No matter how obvious it seems once you've learned it.

It is especially helpful if every so often you go through your notebook and transfer the particularly useful gems of information into a text file that you can then search. Likewise these notes can also be added to your personal annotation of the code.

For experimenting with changes to the kernel, I found using a virtual machine to be invaluable. Trying new kernels, by definition, involves crashing the machine, often in ways that are very difficult to recover from. If nothing else, when you debug on a virtual target you can still do other things while the virtual target is rebooting.

Resources

Load Disqus comments