Contributing to the Linux Kernel

To make changes in the kernel, you need to know all about the diff and patch commands.
diff Concepts

When talking about different patch formats, a number of concepts deserve some thought. The point is probably moot: nearly all open-source projects that use diff and friends have long since settled on a patch format. However, these are some of the qualities in a patch that makes one format more useful than another.

The first thing admired in a patch format is context. Context consists of the extra lines (often three) before and after the difference blocks in a patch. While context adds greatly to the size of the patch file, it allows patches to be not entirely dependent on the exact file on which the patch was based. This quality is very important in a revision-control system, where it is expected that your working files will be slightly different from the master copy. Patching programs can use these context lines to guess where the offending lines can go—and usually get it right.

Second, patches should be reversible. Reversing is useful when you want to go back to a previous version of the source, or when you mistakenly flipped the options to diff and generated an inherently reversed patch (don't laugh—we've all done it). Not all patch formats are reversible, however.

Patch files should be efficient: small and easily readable, but not so large as to be unwieldy for projects with large numbers of changes (such as the Linux kernel). There is a tradeoff here, of course. Patches without context are more efficient, but definitely not very useful in source maintenance.

Finally, readability is a very important aspect for this style of revision control. Patches should be obvious when it comes to figuring out what changed and should not require the user to flip his or her perspective from one block of text to another to figure out the differences. Making a format human-readable is much more difficult than creating a computer-readable one, especially when you are trying to balance all the other format components.

Each of the various diff formats ranks differently in each of these metrics, and different projects may choose different formats. The Linux kernel, for example, uses the unified difference format.

The diff Formats

The POSIX, or “normal”, diff is the default format used by the diff utilities. It's a terse format without any lines of context, but is reversible. I've often seen it used as a sort of “generic” patch format, since non-GNU versions of the diff utilities can parse it.

The context diff format is another reversible format similar to the POSIX one, but which supports context around changes. Some projects prefer this format, especially if they include developers outside the GNU sphere. Using GNU diff, this format can be specified with the --context option.

The unified diff format is another contextual format that is generally more readable than the context variety. Unlike with context diff, this format displays all the changes in a single block, thus eliminating many of the redundant lines with context diffs. Because of the relative merits of this format for revision control and easy reading of patches, this is the preferred format for the Linux kernel and many GNU projects. On the down side, many non-GNU patch programs are unable to recognize this format. With GNU diff, this format can be specified using the --unified option.

The side-by-side format is great for human reading of patches, but is not readily usable for revision control. It displays the originating file and the changes side by side. This format is mostly just for human patch browsing, and the patch program doesn't actually support it. GNU diff users can enable this with the --side-by-side option.

The ed format is an old format that outputs a script for the ed text editor rather than a special patch format. This option was needed before the modern patch utility was created. Since it outputs a script, it contains no context information and no reversal information. GNU provides this option only for compatibility, and it may be invoked with the --ed switch.

The forward ed format is similar to the ed format, but is even more useless. patch can't process files in this format; neither can ed. If you insist, however, GNU diff still can generate it with the --forward-ed option.

The RCS format is the one used by the revision-control system RCS and its derivatives. It's generally not used standalone, and patch can't actually read the format.

The preprocessor format is not quite a patch format and not quite a script. Instead, it is an output file that contains the contents of both files separated by C preprocessor directives such as #ifdef, #endif, #elsif, etc. It is possible to compile either version of the file by setting preprocessor variables (#define) in the source file or by using the -D switch of the GNU compiler. Obviously, this format isn't of much use for revision control, but can occasionally be useful when you want to test changes.


Geek Guide
The DevOps Toolbox

Tools and Technologies for Scale and Reliability
by Linux Journal Editor Bill Childers

Get your free copy today

Sponsored by IBM

Upcoming Webinar
8 Signs You're Beyond Cron

Scheduling Crontabs With an Enterprise Scheduler
11am CDT, April 29th
Moderated by Linux Journal Contributor Mike Diehl

Sign up now

Sponsored by Skybot