Relinking a Multi-Page Web Document

When you need some help getting your web pages back in order, have the computer do it for you.
Identifying Links

How does relink find the HTML links in a web page? It does so by looking for particular patterns on lines containing a hypertext link. relink will scan through an HTML file looking for the pattern /href\s*=/i which matches the letters href followed by zero or more spaces followed by an equal sign. The i at the end of the pattern allows matching without regard to upper and lower case. Lines matching this pattern contain a hypertext link and are possible candidates for updating.

Once a line containing a link is found, a list of link-specific patterns is tested against that line. If a match is found, that hypertext link is updated with information obtained from the links file, and the scanning process continues on the rest of the file. For this process to work, it is important that each hypertext link fit alone on a single line of text. Also, link-specific patterns must be chosen that do not occur normally in the body of the document. If a link-specific pattern should accidently appear on the same line as an unrelated link in the document body, relink will automatically (and incorrectly) update that unrelated link.

I use small GIF files for the next and previous icons, so the link-specific patterns next.gif and prev.gif are good choices for my pages (and since I wrote relink, these are the defaults). You can override these defaults in the links, if your links look significantly different. If there are no unique patterns identifying your links, you can add an HTML comment to the link line and use that as a pattern.

The LINKS File

We have seen a few simple examples of a links file in the discussion above. In addition to page order, you can also specify user-defined link patterns using the following line:

link: linkname pattern

The linkname identifies the type of link (next, prev, index, or anything you can think of). The pattern is a string of characters that must appear on every link of that type. You may override the next, prev, toc (table of contents) and up links that relink normally works with, and you may define your own links here.

A table of contents file may be identified using the line:

tocfile: tocname

Links identified with the toc link pattern will generate a link to this file. Unfortunately, relink will not update the table of contents with new page orders, so you have to edit the table of contents manually to keep it up to date. Perhaps a future version of relink can address this problem.

Nested pages can be specified by using a { on a line by itself to start a nested list and a } to end a nested list. The page immediately preceding the nested list is called the parent page. The first and last page of a nested list point to the parent page in there prev and next links. In addition, each nested page will have an up link to the parent page. The next link of the parent page will skip over the nested list to the following page. (We assume that the parent page has explicit links into the nested list.)

And finally, separate lists of HTML files can be specified by using a line of dashes. next/prev links will not cross a line of dashes.

Summing Up

I have found relink to be a very useful script in dealing with web documentation, making it very easy to update pages in long documents without worrying about the details of manually adjusting the page links.

Jim Weirich is a software consultant for Compuware specializing in Unix and C++. When he is not working on his web pages, you can find him playing guitar, playing with his kids, or playing with Linux. Comments are welcome at jweirich@one.net or visit him at http://w3.one.net/~jweirich/.

______________________

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState