Writing HTML with m4

Ease your creation and maintenance of web pages using this handy pre-processor called m4.
m4 Gotchas

Unfortunately, m4 needs some taming. A little time spent on familiarisation will pay dividends. Definitive documentation is available (for example, in the Emacs info documentation system) but, without being a complete tutorial, here are a few tips based on my experiences.

Gotcha 1—Quotes

m4's quotation characters are the grave accent ` which starts the quote, and the acute accent ' which ends it. It may help to put all arguments to macros in quotes, for example:

_HEAD1(`This is a heading')

The main reason for using quotes is to prevent confusion if commas are contained in an argument to a macro, since m4 uses commas to separate macro parameters. For example, the line _CODE(foo, bar) would put the foo in the HTML output but not the bar. Use quotes in the line _CODE(`foo, bar'), and it works properly.

Gotcha 2—Word Swallowing

The biggest problem with m4 is that some versions of it swallow key words that it recognises, such as include, format, divert, file, gnu, line, regexp, shift, unix, builtin and define. You can protect these words by putting them in single quotes, for example:

Smart people `include' Linux in their list
of computer essentials.

The trouble is, this is both inconvenient and easy to forget.

A safer way to protect keywords (my preference) is to invoke m4 with the -P or --prefix-builtins option. Then all built-in macro names are modified so that they all begin with the prefix m4_ and ordinary words are left as is. For example, using this option, one would write m4_define instead of define (as shown in the examples in this article). One hitch is that not all versions of m4 support this option—most notably some PC versions under MS-DOS.

Gotcha 3—Comments

Comment lines in m4 begin with the # character—everything from the # to the end of the line is ignored and output unchanged. If you want to use # in the HTML page, you must quote it like this: `#'. Another option (my preference) is to change the m4 comment character to something exotic with a line like this:


and not have to worry about # symbols in your text.

If you want to use comments in the m4 file but not have them appear in the final HTML file, use the macro m4_dnl (dnl = Delete to New Line). This macro suppresses everything until the next newline character.

m4_define(_NEWMACRO, `foo bar')
m4_dnl This is a comment

Yet another way to have source code ignored is the m4_divert command. The main purpose of m4_divert is to save text in a temporary buffer for inclusion in the file later—for example, in building a table of contents or index. However, if you divert to “-1”, it just goes to limbo-land. This option is useful for getting rid of the whitespace generated by the m4_define command. For example:

m4_divert(-1) diversion on
m4_define(this ...)
m4_define(that ...)
m4_divert diversion turned off

Gotcha 4—Debugging

Another tip for when things go wrong is to increase the number of error diagnostics that m4 outputs. The easiest way to do this is to add the following to your m4 file as debugging commands:

buggy lines

It should be noted that HTML 3.0 does have an include statement that looks like this:

<!--#include file="junk.html" -->

However, the HTML include has the following limitations:

  • The work of including and interpreting the include is done on the server-side before downloading and adds overhead as the server has to scan files for include statements.

  • Most servers (especially public ISPs) deactivate this feature because of the large overhead.

  • Include is all you get—no macro substitution, no parameters to macros, no ifdef, etc., as with m4.

There are several other features of m4 that I have not yet exploited in my HTML ramblings so far, such as regular expressions. It might be interesting to create a “standard” stdlib.m4 for general use with nice macros for general text processing and HTML functions. By all means download my version of stdlib.m4 as a base for your own hacking. I would be interested in hearing of useful macros, and if there is enough interest, maybe a Mini-HOWTO could evolve from this article.

There are many additional advantages to using Linux to develop HTML pages, far beyond the simple assistance given by the typical typing aids and WYSIWYG tools. Certainly, I will go on using m4 until HTML catches up—I will then do my last make and drop back to using pure HTML. I hope you enjoy these little tricks and encourage you to contribute your own.

Bob Hepple has been hacking at Unix since 1981 under a variety of excuses and has somehow been paid for it at least some of the time. It's allowed him to pursue another interest—living in warm, exotic countries including Hong Kong, Australia, Qatar, Saudi Arabia, Lesotho and (presently) Singapore. His initial aversion to the cold was learned in the UK. Ambition—to stop working for the credit card company and tax man and to get a real job. Bob can be reached at bhepple@pacific.net.sg.


Geek Guide
The DevOps Toolbox

Tools and Technologies for Scale and Reliability
by Linux Journal Editor Bill Childers

Get your free copy today

Sponsored by IBM

8 Signs You're Beyond Cron

Scheduling Crontabs With an Enterprise Scheduler
On Demand
Moderated by Linux Journal Contributor Mike Diehl

Sign up now

Sponsored by Skybot