Contributing to the Linux Kernel

To make changes in the kernel, you need to know all about the diff and patch commands.

Everyone who knows about Linux also knows about the ways Linux is “different” from other, more commercial operating systems. Because the Linux kernel is open source, it is possible for each and every user to become a contributor. Certainly, nearly everyone reading this knows so; it's sort of like preaching to the choir. However, the fact is that most Linux users, even those skilled in the programming arts, have never contributed to the code of the Linux kernel, even though most of us have had days where we thought to ourselves, “gee, I wish Linux could do this...” Through this article and others, I hope to convince some of you to take a look at the Linux kernel in a new, more proactive light.

What are some valid reasons for not contributing to the kernel efforts? First, maybe you can't legally. Many programmers sign contracts that limit their ability to code outside of work, even on non-commercial projects. This is the main reason I chose a profession that has relatively little to do with programming, other than the occasional Perl script. Second, it is possible you don't know how. Many Linux users are relatively new programmers trained in traditional computer science. I know from my own CS education that many schools tend to teach “modern” programming skills—I was one of the few in my particular school who chose (or knew how) to program without an IDE (integrated development environment). Sad, but true. Third, many professional programmers now tend to work with revision control systems in the workplace, and may be hesitant to contribute to projects (such as the kernel development effort) which still use the “bare metal” approach. Last and most likely, many programmers with the skills to hack Linux don't have the time to do so. These are all valid reasons why perfectly qualified programmers with good ideas, a fresh outlook and a desire to contribute have chosen not to. Nothing I can say can help them get past some of those issues, but I hope I can make kernel programming more accessible to at least a small percentage of people.

This is the first article in a series, and I will attempt to dispel some of the mystery behind revision control. Many open-source projects, including the Linux kernel, still use the diff and patch method of content control for a variety of reasons. Most open-source projects still accept patches in this format, even if they distribute code via CVS or some other revision-control system. First, diff and patch provide a project maintainer with an immense amount of control. Patches can be submitted and distributed via e-mail and in plain text; maintainers can read and judge the patch before it ever gets near a tree. Second, there's never a worry about access control or the CVS server going down. Third, it's readily available, generally doesn't require any special tools that aren't distributed as part of every GNU system, and has been used for years. However, bare-bones revision control makes it difficult to track changes, maintain multiple branches or do any other “advanced” things provided by Perforce, CVS or other revision control systems.

diff and patch are a set of command-line programs designed to generate and integrate changes into a source tree. There are multiple “diff” formats supported by the GNU utilities. One major advantage of diff and patch over newer revision-control systems is that diff, especially the unified diff format, allows kernel maintainers to look at changes easily without blindly integrating them.

The diff Family

For the uninitiated, diff and patch are just two of the commands in a complete set of GNU utilities. While they are the most commonly used in practice, other tools are often employed in specific situations. For the purposes of this document, I won't concentrate on these utilities, but will treat them only briefly. For a more complete look, check out your local set of man and info pages.

diff is the first command in the set. It has a simple purpose: to create a file (often confusingly called a patch or a diff) which contains the differences between two text files or two groups of text files. These files are constructed to make it easy to merge the differences into the older file. diff can write in multiple formats, although the unified difference format is generally preferred. The patches this command generates are much easier to distribute than whole files, and they allow maintainers to see quickly and easily what changed and to make a judgment.

patch is diff's logical complement, although oddly, it didn't come along until well after diff was in relatively common use. patch takes a patch file generated by diff and applies it against a file or group of files. patch will notify the user if there is a conflict, although it is often smart enough to resolve simple conflicts. Additionally, patch can act in the reverse; with an updated file and the original patch, this command can revert a file back to its pristine form.

cmp is diff's counterpart for binary files. As applications for binary files in source control are limited, this command is often not used in that environment. Usually, projects that include binary files (for example, a logo) have some other mechanism for updating these components. (Keep in mind that the XPM image format common in Linux applications can actually be text-based and can be controlled using the above commands.)

diff3 is a variant on diff that allows for computing and merging the differences among three files. Personally, I tend to use diff for these purposes, but there are likely reasons why this command is useful in specific situations with which I have not yet dealt.

And finally, sdiff is an interactive version of patch that allows for smarter patching using your very own brain.

These tools have many uses other than content control. I do not want to slight them by implying they are not useful. But, like many tools, they shine only in certain circumstances. (Like that annoying fine screwdriver you get with the set for which you've never seen a screw small enough to use it on, tools are only as good as the situations they are applied against.)

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix