Flexible Formatting with Linuxdoc-SGML

Have your cake and eat it too with this simple but powerful text processing facility assembled by a well-known Linux guru.

As Linux becomes more and more popular, a lot of documentation is required, not only for newcomers, but for all users. Just think of all the FAQs, HOWTOs, manual pages, and books everyone needs for their daily work. Some people want to read these documents as plain ASCII text, while others want to read them over the World Wide Web or print them on their PostScript printer. It is possible to make an HTML version of an ASCII document for the Web and a nicely-formated PostScript version for people to print, but all the different formats have to be maintained separately. This is theoretically possible, but doesn't happen in real life.

We need a documentation system that can produce different formats from a single source. The Linux Documentation Project faced this exact dilemma when the HOWTO project was started, so Matt Welsh wrote the Linuxdoc-SGML package to solve it. With this package, all documentation is formatted in a similar way. But SGML is very flexible, so you can use the system to write many different kinds of documentation; as an example, the XFree86 project uses Linuxdoc-SGML for all of its documentation.

A Linuxdoc-SGML Example
<!doctype linuxdoc system>
<article>
<title>The Very Short Story
<author>A. Author
<date>1 Jan 1970
<p>
Once upon a time, they lived happily ever after.
<article>

As you can see here, the Linuxdoc-SGML syntax is very simple. Commands are written in angle brackets: <command>. When they apply to a block of the text they appear in a pair surrounding that block, so </article> before the block is balanced by </article> after the block. There is also an abbreviation for the latter case if the block is short: <tt/typewriter font/.

The first line of the document specifies the document type. Here you will always specify linuxdoc system, since this refers to the main macros of the Linuxdoc-SGML package. Then you start your document with the <article> command and close it at the end with the corresponding “article off” command </article>. The article itself starts with the title, the author, and the date (which is optional). After that you can start writing the body text. The <p> command indicates the beginning of the first paragraph. You don't have to worry about spaces or line breaks when writing the text, since multiple spaces between words are ignored and line breaks are automatically inserted at the appropriate positions. To begin a new paragraph, insert a blank line, which is a “synonym” for <p>.

Running Linuxdoc-SGML

Linuxdoc-SGML is actually a collection of programs that work together to provide the final output. You need to know how to use each of them; several examples will help. The format program creates files designed for LaTeX, groff, or makeinfo, and is part of the process of creating HTML files, which is explained more fully below. The -T argument tells format which program it is writing files for.

There is one utility for running each of the formatting programs (groff, etc). Each has a name starting with “q”, like “qtex”.

To get a PostScript file via LaTeX, just type

format -T latex example.sgml > example.tex
qtex example

and Linuxdoc-SGML will create a LaTeX-format file, use LaTeX to process that file, then use dvips to turn that into the PostScript file example.ps. Note that you need to have LaTeX and dvips installed, along with Linuxdoc-SGML, for this to work.

If you prefer a DVI file, you may use a -d switch with qtex:

format -T latex example.sgml | qtex -d > example.dvi

The plain ASCII output is created with a similar procedure. Just run:

format -T nroff example.sgml | qroff > example.txt

To get texinfo output that can be read with the GNU info program, use:

format -T info example.sgml

This will create the necessary files in the current directory automatically. Of course, you need the GNU texinfo package installed on your system to make texinfo files.

The HTML output needs a little bit more care, since two compilation stages are necessary to get all cross references built. First, you have to have the LINUXDOC environment variable set up correctly; you will want to put a line such as:

export LINUXDOC=~/linuxdoc-sgml-1.2

in your bash startup file, or:

setenv LINUXDOC=~/linuxdoc-sgml-1.2

in your csh or tcsh startup file.

Once that is working, you have to run several commands to get finished HTML:

format -T html example.sgml | prehtml | \
   fixref > tmp.html
format -T html example.sgml | prehtml >> tmp.html
cat tmp.html | html2html example > example.html
rm tmp.html

It's a good idea to put these commands in a shell script since you will call these commands often. Here's a simple version you can use:

bin/bash
bin/bash
# sgml2html
[ -z -$1- ] && { echo -What file?-; exit 1 }
BASE=`basename $1 .sgml'
[ ! -f $BASE.sgml ] && { echo -No file $BASE.sgml-; exit 2 }
TMP=$$tmp.html
format -T html $1 | prehtml | fixref > $TMP
format -T html $1 | prehtml >> $TMP
cat $TMP | html2html $BASE > $BASE.html
rm $TMP

This script requires that your input file has the extension .sgml.

This script must be given the full file name, and it requires that the file have the extension .sgml to work correctly.

All of this is documented more completely in the excellent, short manual provided with Linuxdoc-SGML.

______________________

Webcast
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers

Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.

Learn More

Sponsored by AMD

White Paper
Red Hat White Paper: Using an Open Source Framework to Catch the Bad Guy

Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6

Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.

Learn more about catching the bad guy in this free white paper.

Learn More

Sponsored by DLT Solutions