Industrializing Web Page Construction
December 1st, 1997 by Pieter Hintjens in
When I started building my company's web site about a year ago, I looked for a good, visual web editor, and finding one quickly produced some nice web pages. A week later, I had thrown the web editor away and was working on a tool to solve some of the major difficulties I had found. In this article I'll look at the result—a free HTML preprocessor written in Perl—that makes mass production of web pages a feasible and economical task.
htmlpp was one of the first Perl programs I wrote, and I've not regretted the choice of language. Perl allows me to add functions to the program as fast as I can think of them. The consequence is that htmlpp is a very rich tool, making the task of maintaining a web site with thousands of pages easy.
There are at least a dozen free HTML preprocessors available today; I know of three with the name htmlpp. Something is driving people to write these programs, but what? Some 95% of the web pages I produce are on-line documentation, and I dislike building these by hand. Each page needs a standard header, footer and appearance. When I change my mind, it takes a lot of mouse clicks to go through each web page again, and a lot of care to make sure that every page conforms to my preferred style.
Thus, I started htmlpp with the idea: “take a large text file and break it into smaller web pages, adding pretty headers and footers, building the table of contents, cross-references and hyperlinks.” It would also be nice to define symbols like $(version) and place them into the text. How about conditional blocks so that I can generate frame and non-frame web pages from the same document, a way to share definitions between projects, a for loop to build structured text, access to environment variables and Perl macros, some more hot coffee and a raisin bagel?
htmlpp uses the term “document” to refer to the text files it inputs. This is a “hello world” document:
.echo Hello, World.
Here's something more involved:
.define new-year 0101
.if "&date("mm-dd")" eq "$(new-year)"
. echo Happy New Year!
.else
. echo Hello, World.
.endif
If you've used C or C++, htmlpp looks very much like the C
preprocessor. You get commands like
.define,
.include and
.if that work in a similiar
fashion to the C preprocessor equivalents. For instance, the .if
command works at “compile time”, i.e., when you build the HTML
pages, not when they are displayed by the browser. Some other
htmlpp commands were borrowed from the Unix shells.
Note how I define a symbol, new-year, and then use it in the document as $(new-year). htmlpp provides many variations on this theme; for example, the $(*...) form creates a hyperlink:
.define lj http://www.ssc.com/lj/ $(*lj="Linux Journal"<\n>) is the magazine of the Linux community.
To define a counter which runs from 0 upwards:
.define counter++ 0A realistic htmlpp script uses the .page command to create HTML pages. Listing 11 shows the template file supplied by htmlpp for your new projects.
Each HTML page gets a header and a footer. htmlpp lets you construct very complex headers and footers. This footer, taken from the htmlpp documentation, builds hyperlinks to the first, previous, next and last pages in the document, plus an index that lets the user jump to any page in the document.
.block footer <HR><P> | $(*FIRST_PAGE=<<) | $(*PREV_PAGE=<) | $(*NEXT_PAGE=>) | $(*LAST_PAGE=>>) .build index <P><A HREF="/index.htm"> <IMG SRC="im0096c.gif" WIDTH=96 HEIGHT=36 ALT="iMatix"></A> Designed by <.HREF "/html/pieter.htm" "Pieter Hintjens"> © 1997 iMatix </BODY></HTML> .endblock
The .build index command builds the index by making a list of all the pages in the document. With an .if command, we can show the current page in relationship to the other pages. This is how I define the index:
.block index_open <BR> .block index_entry .if "$(INDEX_PAGE)" eq "$(PAGE)" | <.EM $(INDEX_TITLE)> .else | $(*INDEX_PAGE="$(INDEX_TITLE)") .endif .endblockThis code is beginning to get a bit complex, but the results are well worth the effort. The symbols in capital letters (e.g., $(PAGE), the file name for the current HTML page) are supplied by htmlpp. Some of these symbols, such as $(NEXT_PAGE), require that htmlpp go over the document several times. In fact, htmlpp will run through the document three or more times, until all cross references have been resolved. This multi-pass approach can be a little slow, but it is powerful enough to handle the footer block shown above.
The .build toc command builds a table of contents, a vital part of any large document. htmlpp comes with a small file, contents.def, that does this job. To build the table of contents, you do the following:
.include contents.def
The contents.def file first defines three blocks (toc_open, toc_entry and toc_close) and then does a .build toc:
.block toc_open <MENU> .block toc_entry <LI><A HREF="$(TOC_HREF)">$(TOC_TITLE)</A></LI> .block toc_close </MENU> .end <P> .build toc <HR>htmlpp uses such predefined blocks for headers, footers, indexes, table of contents and other constructions. You can define your own blocks in order to pull standard chunks of HTML text into your pages. You can also use .include commands, but this practice can lead to the creation of many small files.
The key to unlocking htmlpp's real power is learning a little Perl. When you use the .if command, for instance, you use Perl. So, I can write something like this:
.if $ENV {"RELEASE"} eq "test"
It's also possible to run Perl programs and pipe the output into your HTML pages or to extend htmlpp's syntax with your own functions. Finally, since htmlpp comes with source code under the GNU General Purpose License, you can change the tool in any way you wish.
At the other extreme, you can use htmlpp in “guru mode” to turn a simple text file into structured HTML pages. All you need to do is mark the section headers. htmlpp inserts a table of contents, breaks the document into pages, adds headers and footers, detects numbered and bulleted lists, paragraphs, tables and so on. This is a quick and lazy way to produce useful HTML pages without tagging every paragraph.
To use htmlpp, you have to be happy writing HTML by hand (unless you work in guru mode). In return, you get an economical way to maintain large web sites without losing any control over the quality of your work.
To install and use htmlpp, you need Perl version 4 or 5. Download htmlpp from http://www.imatix.com/ and unpack the .zip file. The package comes with HTML pages describing how to install and use. If you have questions, comments or suggestions, don't hesitate to send me e-mail.
Special Magazine Offer -- Free Gift with Subscription
Receive a free digital copy of Linux Journal's System Administration Special Edition as well as instant online access to current and past issues. CLICK HERE for offer
Linux Journal: delivering readers the advice and inspiration they need to get the most out of their Linux systems since 1994.
Subscribe now!
The Latest
Newsletter
Tech Tip Videos
- Jul-01-09
- Jun-29-09
Recently Popular
From the Magazine
July 2009, #183
News Flash: Linux Kernel 3.0 to include an on-the-go Expresso machine interface! Ok, maybe not, but Linux is definitely going mobile, from phones to e-readers. Find out more inside about Android, the Kindle 2, the Western Digital MyBook II, The Bug, and Indamixx (a portable recording studio). And if you've gone mobile and you been wanting more Emacs in your life then check out Conkeror.
To compliment the mobile we've got the stationary: parsing command line options with getopt, checking your Ruby code with metric_fu, and building a secure Squid proxy. How is this stationary you ask? What can we say? It's not. We just wanted to see if anybody actually read this part of the page :) .
All this and more, and all you have to do is get your hot sweaty hands on the latest copy of Linux Journal.

Delicious
Digg
StumbleUpon
Reddit
Facebook








Post new comment