Fixing HTML with the WDG HTML Validator

January 19th, 2003 by Don Marti in

All hail KHTML. Now validate your site.
Your rating: None

Apple's decision to promote KDE's KHTML rendering engine and, by extension, the KHTML-based Konqueror to Major Browser status makes web standards important again. Webmasters who test in only Microsoft Internet Explorer are going to have Linux and Macintosh users, not only the former, complaining about broken HTML.

Fortunately, the new Apple browser generates a confusing User-Agent header, which helps discourage "browser sniffing". Better to make your site correct, anyway.

If your site obeys the standards and a browser messes it up, you can count on the now-competitive browser developers to fix it. If your site is incorrect, expect complaints.

So, how do you make sure your site is valid HTML and not simply cut-and-pasted, "looks fine to me" HTML?

Liam Quinn's WDG HTML Validator is written in the nearly-ubiquitous Perl and runs as a CGI script, so you can install it on one system and use it from anywhere. You don't need to install it on your production web server, any system on the Net will do.

You can try out the Validator on the WDG site, but if you have a lot of pages to fix, it's faster and more polite to install it at your own site. WDG also has a nifty set of HTML tag reference pages, linked to and from the Validator results, that help you understand and fix your mistakes. I installed it in minutes from the Debian packages; RPMs also are available.

How Well Does It Work?

I dropped in the URL of my fresh, clean personal home page, created with stylesheets in my best attempt at HTML 4.01 Strict, and foolishly expected it to validate cleanly. No way. The Validator started complaining beginning at the <body> tag.

Error: there is no attribute BGCOLOR for this element.

What? I've been putting bgcolor in body tags almost as long as I've been writing HTML! Time to hit the book, Dynamic HTML: The Definitive Reference, 2nd Edition and see what's up. Aha! This attribute is deprecated in HTML 4.01, and I'm using "strict" DTD, so it's time to move bgcolor to the stylesheet where it belongs. It's not a big thing, but it makes the actual page a little smaller and lets me change all the colors in one place.

body {
    background-color: #aaaaaa;
}

Now, next time Talk Like a Pirate Day rolls around, I can change everything to white text on a black background with a single edit to the stylesheet, then concentrate on me prose, mateys.

But what's this? My page has a link to http://news.google.com/news?q=linux&scoring=d to easily catch up on the Linux-related news. But the Validator says:

Error: general entity scoring not defined and no default entity

Fortunately, that's in the Common Problems section. Time to replace that ampersand in the link with an &amp; entity. Here's another one:

Error: element NOBR undefined

I hit the book, and it turns out the <nobr> tag was never standardized at all; it's "folk HTML" that browsers happen to recognize. In this case, I'll delete the tags, chill and let the browser flow the text the way it wants.

The next item on the list of HTML mistakes occurred on line 129:

Error: end tag for TT omitted, but its declaration does not permit this

followed by this on line 131:

Error: end tag for element TT which is not open

Aha! This plainly is sloppy HTML. I had a <tt> started inside a <p>, but the </tt> was after the </p>. It looks fine in the browser I use, but this kind of mistake is exactly the kind of error that makes different browsers react differently. Remember, most of the significant differences among browsers are in how they react to mistakes and not how they deal with correct HTML. Before you start sniffing User-Agent and such ugliness, make sure your pages are standard.

Next, in a line with a <blockquote> tag, there's

Error: character data is not allowed here

Checking the reference page linked to from the Validator results, here's the problem: "The content of the BLOCKQUOTE element should be contained within other block-level elements, typically P." Time to make sure that instead of using <blockquote>, I'm using <blockquote><p>.

After a few more errors, the process gets tedious. Why didn't I validate this thing to start with and fix errors one at a time? Why did I write a quick-and-dirty conversion script that wasn't careful about matching <p> and </p>? A quick detour to a friend's page shows his first error on line 1. Ha! It's not only me.

All along, though, the Validator output makes it easy to track down the problems. Using Mozilla tabs, I can pop between the page in the browser and the Validator results.

Finally, the "Congratulations, no errors!" message appeared. Fixing a personal home page is a tiny amount of work compared to repairing damage in a deep, automatically-generated site. When writing software to crank out HTML automatically, it's worth the extra time to feed the output through the Validator to make sure it's right from the ground up, instead of having to troubleshoot when a new browser, or new version, comes along.

Resources

Buy HTML: The Definitive Reference, 2nd Edition from Powell's, our partner bookstore.

Don Marti is editor in chief of Linux Journal.

__________________________


Special Magazine Offer -- Free Gift with Subscription
Receive a free digital copy of Linux Journal's System Administration Special Edition as well as instant online access to current and past issues. CLICK HERE for offer

Linux Journal: delivering readers the advice and inspiration they need to get the most out of their Linux systems since 1994.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Anonymous's picture

Re: Fixing HTML with the WDG HTML Validator

On January 23rd, 2003 Anonymous says:

Then you check your stylesheet and find you should not have :

body {

background-color: #aaaaaa;

}

but something like :

body {

background: #aaaaaa; color: white ;

}

just never ends....

Anonymous's picture

Re: Fixing HTML with the WDG HTML Validator

On January 19th, 2003 Anonymous says:

What about Mozilla? Isn't Mozilla Standardized enough for you? Let's all dump our GNOME installs, and only use KDE and Konquerer because DonMarti says we need to "All Hail KHTML".

What's that? Mozilla is too big? Use Phoenix then. Or Galeon.

DonMarti, stop polluting the world with KDE Propaganda, and let us have our freedoms of choice. If I want to use KHTML based browsers, then I have the choice to, except I don't choose to use it.

Anonymous's picture

Re: Fixing HTML with the WDG HTML Validator

On January 23rd, 2003 Anonymous says:

Dude... Chill.

Anonymous's picture

Re: Fixing HTML with the WDG HTML Validator

On January 21st, 2003 Anonymous says:

>DonMarti, stop polluting the world with KDE Propaganda,

>and let us have our freedoms of choice. If I want to use KHTML

>based browsers, then I have the choice to, except

>I don't choose to use it.

Okay, then don't use it. You still have the choice, for goodness' sake.

xtifr's picture

completely missing the point.

On January 20th, 2003 xtifr (not verified) says:

I don't use KDE (or GNOME), I have little or no direct interest in KHTML, but nevertheless, I think it was a good headline, and I think you're missing the point when you complain about it. Don isn't saying that everyone should switch to KDE. He's saying that because Apple is now using KHTML, instead of IE, that means that the decision to support only IE on your website is even more of a bad idea than it once once. The fact that KHTML is about to become very popular is good for all of us -- I'm a Mozilla user myself, and as a Mozilla user, I'd like to join Don in saying "all hail KHTML!" :)

Anonymous's picture

Re: Fixing HTML with the WDG HTML Validator

On January 19th, 2003 Anonymous says:

> I hit the book, and it turns out the <nobr> tag was never

> standardized at all; it's "folk HTML" that browsers happen to

> recognize. In this case, I'll delete the tags, chill and let the

> browser flow the text the way it wants.

in fact you can use ' ' to emulate <nobr>. see also

http://www.w3.org/TR/html401/struct/text.html#h-9.3.2.2

cheers....

Anonymous's picture

Re: I love

On January 20th, 2003 Anonymous says:

The problem with using &nbsp; is that it has to be used between all words

grouped, etc. Also, depending on the browser pushing 2 image tags side-by-side

won't necessarity prevent a line break. <nobr> is very effective here as well.

For a tag that hasn't been official in ages, <nobr> is widely supported and very

handy. I use it but never count on it.

Post new comment

Please note that comments may not appear immediately, so there is no need to repost your comment.
The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <pre> <ul> <ol> <li> <dl> <dt> <dd> <i> <b>
  • Lines and paragraphs break automatically.

More information about formatting options

Newsletter

Each week Linux Journal editors will tell you what's hot in the world of Linux. You will receive late breaking news, technical tips and tricks, and links to in-depth stories featured on www.linuxjournal.com.
Sign up for our Email Newsletter

Tech Tip Videos

From the Magazine

December 2009, #188

If last month's Infrastrucuture issue was too "big" for you then try on this month's Embedded issue. Find out how to use Player for programming mobile robots, build a humidity controller for your root cellar, find out how to reduce the boot time of your embedded system, and if you're new to embedded systems find out the basics that go into one. You can also read about the Beagle Board, the Mesh Potato and a spate of other interestingly named items. And along with our regular columns don't miss our new monthly column: Economy Size Geek.


Read this issue