Contributing to the Linux Kernel

To make changes in the kernel, you need to know all about the diff and patch commands.

Everyone who knows about Linux also knows about the ways Linux is “different” from other, more commercial operating systems. Because the Linux kernel is open source, it is possible for each and every user to become a contributor. Certainly, nearly everyone reading this knows so; it's sort of like preaching to the choir. However, the fact is that most Linux users, even those skilled in the programming arts, have never contributed to the code of the Linux kernel, even though most of us have had days where we thought to ourselves, “gee, I wish Linux could do this...” Through this article and others, I hope to convince some of you to take a look at the Linux kernel in a new, more proactive light.

What are some valid reasons for not contributing to the kernel efforts? First, maybe you can't legally. Many programmers sign contracts that limit their ability to code outside of work, even on non-commercial projects. This is the main reason I chose a profession that has relatively little to do with programming, other than the occasional Perl script. Second, it is possible you don't know how. Many Linux users are relatively new programmers trained in traditional computer science. I know from my own CS education that many schools tend to teach “modern” programming skills—I was one of the few in my particular school who chose (or knew how) to program without an IDE (integrated development environment). Sad, but true. Third, many professional programmers now tend to work with revision control systems in the workplace, and may be hesitant to contribute to projects (such as the kernel development effort) which still use the “bare metal” approach. Last and most likely, many programmers with the skills to hack Linux don't have the time to do so. These are all valid reasons why perfectly qualified programmers with good ideas, a fresh outlook and a desire to contribute have chosen not to. Nothing I can say can help them get past some of those issues, but I hope I can make kernel programming more accessible to at least a small percentage of people.

This is the first article in a series, and I will attempt to dispel some of the mystery behind revision control. Many open-source projects, including the Linux kernel, still use the diff and patch method of content control for a variety of reasons. Most open-source projects still accept patches in this format, even if they distribute code via CVS or some other revision-control system. First, diff and patch provide a project maintainer with an immense amount of control. Patches can be submitted and distributed via e-mail and in plain text; maintainers can read and judge the patch before it ever gets near a tree. Second, there's never a worry about access control or the CVS server going down. Third, it's readily available, generally doesn't require any special tools that aren't distributed as part of every GNU system, and has been used for years. However, bare-bones revision control makes it difficult to track changes, maintain multiple branches or do any other “advanced” things provided by Perforce, CVS or other revision control systems.

diff and patch are a set of command-line programs designed to generate and integrate changes into a source tree. There are multiple “diff” formats supported by the GNU utilities. One major advantage of diff and patch over newer revision-control systems is that diff, especially the unified diff format, allows kernel maintainers to look at changes easily without blindly integrating them.

The diff Family

For the uninitiated, diff and patch are just two of the commands in a complete set of GNU utilities. While they are the most commonly used in practice, other tools are often employed in specific situations. For the purposes of this document, I won't concentrate on these utilities, but will treat them only briefly. For a more complete look, check out your local set of man and info pages.

diff is the first command in the set. It has a simple purpose: to create a file (often confusingly called a patch or a diff) which contains the differences between two text files or two groups of text files. These files are constructed to make it easy to merge the differences into the older file. diff can write in multiple formats, although the unified difference format is generally preferred. The patches this command generates are much easier to distribute than whole files, and they allow maintainers to see quickly and easily what changed and to make a judgment.

patch is diff's logical complement, although oddly, it didn't come along until well after diff was in relatively common use. patch takes a patch file generated by diff and applies it against a file or group of files. patch will notify the user if there is a conflict, although it is often smart enough to resolve simple conflicts. Additionally, patch can act in the reverse; with an updated file and the original patch, this command can revert a file back to its pristine form.

cmp is diff's counterpart for binary files. As applications for binary files in source control are limited, this command is often not used in that environment. Usually, projects that include binary files (for example, a logo) have some other mechanism for updating these components. (Keep in mind that the XPM image format common in Linux applications can actually be text-based and can be controlled using the above commands.)

diff3 is a variant on diff that allows for computing and merging the differences among three files. Personally, I tend to use diff for these purposes, but there are likely reasons why this command is useful in specific situations with which I have not yet dealt.

And finally, sdiff is an interactive version of patch that allows for smarter patching using your very own brain.

These tools have many uses other than content control. I do not want to slight them by implying they are not useful. But, like many tools, they shine only in certain circumstances. (Like that annoying fine screwdriver you get with the set for which you've never seen a screw small enough to use it on, tools are only as good as the situations they are applied against.)

______________________

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState