HTML: A Gentle Introduction

In the May issue of Linux Journal, Eric explained how to set up and install CERN's World Wide Web server software. This month, he tells us how to use HTML to create hypertext documents for viewing by Web surfers.
An Example Document

Let's look at an example document which contains many of the text markups which I just explained. The HTML source for the example document is listed below, and the formatted document, as displayed by Mosaic, as I have configured it, is shown in Figure 1, below.

Figure 1.

<TITLE>Example Document 1</TITLE>
<H1>Example Document 1</H1>
<H2>A Few Physical Styles</H2>
<I>This is in italics</I><BR>
<B>This in in Bold face</B><BR>
<U>This is underlined</U><BR>
<H2>A Couple Logical Styles</H2>
<EM>This is text is displayed with
<STRONG>This text has strong emphasis</STRONG><P>
<H2>An Unnumbered List</H2>
<LI>Apples can be red
<LI>Oranges can be orange
<H2>A Definition List</H2>
<DT>Term One
<DD>This is a short definition.
<DT>Term Two
<DD>This is a much longer definition, which
demonstrates what happens when a definition is
carried over to more than one line.

Note a few interesting things about how the browser displayed this document. Text marked up to be underlined is not displayed as underlined. This demonstrates one of the dangers of physical styles—some browsers may not support, or may not display a physical style as expected. Also, note that italic text looks like emphasized text and bold text looks like strong text. Notice the use of BR and P break tags, and the display of a multiline definition.


Uniform Resource Locators, or URLs, are designed to provide a standard format by which to point to a file. The file may exist on any network-accessible machine the browser has access to, and files may be accessed using a variety of protocols. The general form for a URL is: protocol://host.domain[:port]/path/filename

The protocol indicates how the browser should communicate with the host it is requesting a file from. Probably the most common protocols are http, file, gopher, and ftp. The http protocol indicates that the browser should contact the server using the hypertext transport protocol, which is used by servers designed to serve HTML documents. The file protocol is used to retrieve a file from a local directory. Many browsers also support an ftp protocol for retrieving non-local files using anonymous ftp. The gopher protocol is used to retrieve documents from a gopher server.

The host.domain is the host and domain name of the remote server to contact in order to retrieve a document. If the document is on the local system, you can create a partial URL that does not specify the host.domain. To do this, you would omit the //host.domain from the URL. Following the host.domain is the optional (as indicated by the “[ ]” characters, which should not be entered) port to connect to in order to retrieve a document. This option is often omitted since most remote services will be provided at a well-known port on the remote system. For instance, the http protocol is commonly found on port 80, while gopher is found on port 70. When omitting the port, omit the “:” character as well.

The path is used to indicate the directory location of the desired document. The filename specification indicates the name of the file on the server where the document is stored.

As an example, if you wanted to view the document foo.html, which you know is located in directory /docs/ on the server, you could use the URL:

Your browser would then display the foo.html document. It is worth noting that many browsers will use the file extension to help determine how to display a document. For instance, .html is commonly used for html documents and .text is used for text documents. For this reason, it is usually a good idea to append a standard extension to your documents to help ensure that they will be displayed properly. You may want to refer to the documentation on your browser to determine what file extensions are supported, although many browsers are now referring to the mailcap file to help determine the interpretation of a particular extension.