Flexible Formatting with Linuxdoc-SGML

Have your cake and eat it too with this simple but powerful text processing facility assembled by a well-known Linux guru.

As Linux becomes more and more popular, a lot of documentation is required, not only for newcomers, but for all users. Just think of all the FAQs, HOWTOs, manual pages, and books everyone needs for their daily work. Some people want to read these documents as plain ASCII text, while others want to read them over the World Wide Web or print them on their PostScript printer. It is possible to make an HTML version of an ASCII document for the Web and a nicely-formated PostScript version for people to print, but all the different formats have to be maintained separately. This is theoretically possible, but doesn't happen in real life.

We need a documentation system that can produce different formats from a single source. The Linux Documentation Project faced this exact dilemma when the HOWTO project was started, so Matt Welsh wrote the Linuxdoc-SGML package to solve it. With this package, all documentation is formatted in a similar way. But SGML is very flexible, so you can use the system to write many different kinds of documentation; as an example, the XFree86 project uses Linuxdoc-SGML for all of its documentation.

A Linuxdoc-SGML Example
<!doctype linuxdoc system>
<title>The Very Short Story
<author>A. Author
<date>1 Jan 1970
Once upon a time, they lived happily ever after.

As you can see here, the Linuxdoc-SGML syntax is very simple. Commands are written in angle brackets: <command>. When they apply to a block of the text they appear in a pair surrounding that block, so </article> before the block is balanced by </article> after the block. There is also an abbreviation for the latter case if the block is short: <tt/typewriter font/.

The first line of the document specifies the document type. Here you will always specify linuxdoc system, since this refers to the main macros of the Linuxdoc-SGML package. Then you start your document with the <article> command and close it at the end with the corresponding “article off” command </article>. The article itself starts with the title, the author, and the date (which is optional). After that you can start writing the body text. The <p> command indicates the beginning of the first paragraph. You don't have to worry about spaces or line breaks when writing the text, since multiple spaces between words are ignored and line breaks are automatically inserted at the appropriate positions. To begin a new paragraph, insert a blank line, which is a “synonym” for <p>.

Running Linuxdoc-SGML

Linuxdoc-SGML is actually a collection of programs that work together to provide the final output. You need to know how to use each of them; several examples will help. The format program creates files designed for LaTeX, groff, or makeinfo, and is part of the process of creating HTML files, which is explained more fully below. The -T argument tells format which program it is writing files for.

There is one utility for running each of the formatting programs (groff, etc). Each has a name starting with “q”, like “qtex”.

To get a PostScript file via LaTeX, just type

format -T latex example.sgml > example.tex
qtex example

and Linuxdoc-SGML will create a LaTeX-format file, use LaTeX to process that file, then use dvips to turn that into the PostScript file example.ps. Note that you need to have LaTeX and dvips installed, along with Linuxdoc-SGML, for this to work.

If you prefer a DVI file, you may use a -d switch with qtex:

format -T latex example.sgml | qtex -d > example.dvi

The plain ASCII output is created with a similar procedure. Just run:

format -T nroff example.sgml | qroff > example.txt

To get texinfo output that can be read with the GNU info program, use:

format -T info example.sgml

This will create the necessary files in the current directory automatically. Of course, you need the GNU texinfo package installed on your system to make texinfo files.

The HTML output needs a little bit more care, since two compilation stages are necessary to get all cross references built. First, you have to have the LINUXDOC environment variable set up correctly; you will want to put a line such as:

export LINUXDOC=~/linuxdoc-sgml-1.2

in your bash startup file, or:

setenv LINUXDOC=~/linuxdoc-sgml-1.2

in your csh or tcsh startup file.

Once that is working, you have to run several commands to get finished HTML:

format -T html example.sgml | prehtml | \
   fixref > tmp.html
format -T html example.sgml | prehtml >> tmp.html
cat tmp.html | html2html example > example.html
rm tmp.html

It's a good idea to put these commands in a shell script since you will call these commands often. Here's a simple version you can use:

# sgml2html
[ -z -$1- ] && { echo -What file?-; exit 1 }
BASE=`basename $1 .sgml'
[ ! -f $BASE.sgml ] && { echo -No file $BASE.sgml-; exit 2 }
format -T html $1 | prehtml | fixref > $TMP
format -T html $1 | prehtml >> $TMP
cat $TMP | html2html $BASE > $BASE.html
rm $TMP

This script requires that your input file has the extension .sgml.

This script must be given the full file name, and it requires that the file have the extension .sgml to work correctly.

All of this is documented more completely in the excellent, short manual provided with Linuxdoc-SGML.


Geek Guide
The DevOps Toolbox

Tools and Technologies for Scale and Reliability
by Linux Journal Editor Bill Childers

Get your free copy today

Sponsored by IBM

8 Signs You're Beyond Cron

Scheduling Crontabs With an Enterprise Scheduler
On Demand
Moderated by Linux Journal Contributor Mike Diehl

Sign up now

Sponsored by Skybot