Relinking a Multi-Page Web Document

When you need some help getting your web pages back in order, have the computer do it for you.
Identifying Links

How does relink find the HTML links in a web page? It does so by looking for particular patterns on lines containing a hypertext link. relink will scan through an HTML file looking for the pattern /href\s*=/i which matches the letters href followed by zero or more spaces followed by an equal sign. The i at the end of the pattern allows matching without regard to upper and lower case. Lines matching this pattern contain a hypertext link and are possible candidates for updating.

Once a line containing a link is found, a list of link-specific patterns is tested against that line. If a match is found, that hypertext link is updated with information obtained from the links file, and the scanning process continues on the rest of the file. For this process to work, it is important that each hypertext link fit alone on a single line of text. Also, link-specific patterns must be chosen that do not occur normally in the body of the document. If a link-specific pattern should accidently appear on the same line as an unrelated link in the document body, relink will automatically (and incorrectly) update that unrelated link.

I use small GIF files for the next and previous icons, so the link-specific patterns next.gif and prev.gif are good choices for my pages (and since I wrote relink, these are the defaults). You can override these defaults in the links, if your links look significantly different. If there are no unique patterns identifying your links, you can add an HTML comment to the link line and use that as a pattern.

The LINKS File

We have seen a few simple examples of a links file in the discussion above. In addition to page order, you can also specify user-defined link patterns using the following line:

link: linkname pattern

The linkname identifies the type of link (next, prev, index, or anything you can think of). The pattern is a string of characters that must appear on every link of that type. You may override the next, prev, toc (table of contents) and up links that relink normally works with, and you may define your own links here.

A table of contents file may be identified using the line:

tocfile: tocname

Links identified with the toc link pattern will generate a link to this file. Unfortunately, relink will not update the table of contents with new page orders, so you have to edit the table of contents manually to keep it up to date. Perhaps a future version of relink can address this problem.

Nested pages can be specified by using a { on a line by itself to start a nested list and a } to end a nested list. The page immediately preceding the nested list is called the parent page. The first and last page of a nested list point to the parent page in there prev and next links. In addition, each nested page will have an up link to the parent page. The next link of the parent page will skip over the nested list to the following page. (We assume that the parent page has explicit links into the nested list.)

And finally, separate lists of HTML files can be specified by using a line of dashes. next/prev links will not cross a line of dashes.

Summing Up

I have found relink to be a very useful script in dealing with web documentation, making it very easy to update pages in long documents without worrying about the details of manually adjusting the page links.

Jim Weirich is a software consultant for Compuware specializing in Unix and C++. When he is not working on his web pages, you can find him playing guitar, playing with his kids, or playing with Linux. Comments are welcome at jweirich@one.net or visit him at http://w3.one.net/~jweirich/.

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix