Literate Programming Using Noweb

Noweb is a tool designed to enable a programmer to write documentation and code at the same time, with the goal of producing code that is easy to understand and maintain.

In essence, the purpose of literate programming (LP) can be found in the following quote:

“Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do.”—Donald E. Knuth, 1984.

Such an environment reverses the notion of including documentation, in the form of comments within the code, to one where the code is embedded within a program's description. In doing so, literate programming facilitates the development and presentation of computer programs that more closely follow the conceptual map from the problem space to the solution space. This, in turn, leads to programs that are easier to debug and maintain.

When writing literate programs, one specifies the program description and the program code in a single source file, in the order best suited to human understanding. The program code can be extracted and assembled from this file into a form which the compiler or interpreter can understand—a process called “tangling”. Documentation is produced by “weaving” the description and code into a form ready to be typeset (most often by TeX or LATeX).

Many different tools have been created for literate programming over the years. Most of the more popular are based, either directly or conceptually, on the WEB system created by D. E. Knuth (“Literate Programming”, The Computer Journal (27)2:97-111, 1984). This article focuses on Norman Ramsey's noweb—a simple to use, extensible literate programming tool that is independent of the target programming language.

Overview of the Noweb System

When you write a literate program using noweb, you create a simple text file (which by convention has a .nw extension) in which you provide all of the technical documentation for the various parts of the program, along with the actual source code for each part of the program.

Figure 1. Typeset Version of Perl Script

This file ( Listing 1 ), which we call the nw source file, is then processed by noweave to create the documentation in a form ready for typesetting (the typeset version of the program is shown in Figure 1), or by notangle to extract the code chunks and assemble them in their proper order for the compiler or interpreter (the executable version of the program is in Listing 2 ). These two processes are not stand-alone programs, but a set of filters through which the nw source file is piped. It is this pipeline system that makes noweb both flexible and extensible, since the pipelines can be modified and new filters can be created and inserted in the pipelines to change the behavior of noweb.

Like most literate programming tools, noweb depends on TeX or LATeX—(LA)TeX to refer to either—for typesetting the documentation (although it has options for producing HTML output as well). However, one need not be a (LA)TeX guru to produce good results. All of the hard work of cross-referencing, indexing and typesetting the code is handled automatically by noweave.

The Typeset Documentation

The best way to get a feel for the capabilities of noweb is by reference to the finished product: the typeset version of a program. Figure 1 represents the typeset version of a Perl script that actually extends noweb's functionality by providing a limited “autodefs” filter. This filter will recognize and mark package and subroutine names for automatic cross-referencing and indexing.

When looking at this example, one can quickly see how chunks of actual code are interspersed throughout the descriptive text. Each code chunk is uniquely identified by page number and an alphabetic sub-page reference. For example, in Figure 1, there are four code chunks on the first page labeled in the left margin as 1a, 1b, 1c and 1d.

Besides the marginal tag, the first line of each code chunk also has its name and a chunk reference enclosed in angle brackets at the left margin and perhaps cross-reference information at the right margin. Lets examine chunk 1b more closely—a reasonable facsimile of its first line is:

1b <

This line tells us that we are now in chunk 1b. The <Global variables 1a>+= construct tells us we are working on the chunk named Global variables whose definition begins in chunk 1a. The += indicates that we are adding to the definition of Global variables. At the right margin we encounter (1d) <1a 1c>, which means that the chunk we are defining is used in chunk 1d, and that the current chunk is continued from chunk 1a and will be further continued in chunk 1c. It should be noted that all of these visual cross-referencing clues—with the exception of the chunk name itself—are provided automatically by noweb.

At the end of any chunk there are two optional footnotes—a “Defines” footnote and a “Uses” footnote. A user can manually specify, in the nw source file, a list of identifiers (i.e., variables or subroutines) which are defined in the current chunk. In addition, some identifiers may be automatically recognized, if an “autodefs” filter for the programming language is used. There are autodefs filters available for many languages including C, Icon, TeX, yacc and Pascal).

These identifiers are listed in the “Defines” footnote below the chunk where their definition occurs, along with a reference to any chunks which use them. Any occurrence of a defined identifier is referenced in a “Uses” footnote below the chunk that uses that identifier.

For example, in Figure 1, we see that chunk 1c defines the term $index_prefix which is used in chunk 2b. A quick peek at chunk 2b verifies that, indeed, this term is used and appears in the “Uses” footnote for that chunk.

Chunk 1d, autodefs.perl, represents the top level description of our entire program. This chunk is referred to as a “root” chunk in noweb and is not used in any other chunk. Our example has but one root chunk, although as many as you wish can be defined in your nw source file, and notangle can extract each of them into separate files.

The first line of code in chunk 1d is the obligatory #!/usr/bin/perl line which must begin all Perl scripts intended to be invoked as an executable program. However, the next two lines are not lines of Perl code at all but instead are references to other named chunk definitions. The code from those referenced chunks will be inserted at this point in the executable program extracted by notangle. Thus, we have a broad overview of our program, uncluttered by the specific global variable initializations and subroutine definitions.

Looking at chunk 2a, which is included in our root chunk, we see that it also includes another chunk, chunk 2b. This demonstrates that the inclusion of chunks can be nested to practically any level and can occur in any order in the documentation (definitions need not precede uses).

Our documentation ends with two optional indices provided by noweb—an index of code chunks and an index of identifiers.