Fixing HTML with the WDG HTML Validator

All hail KHTML. Now validate your site.

Apple's decision to promote KDE's KHTML rendering engine and, by extension, the KHTML-based Konqueror to Major Browser status makes web standards important again. Webmasters who test in only Microsoft Internet Explorer are going to have Linux and Macintosh users, not only the former, complaining about broken HTML.

Fortunately, the new Apple browser generates a confusing User-Agent header, which helps discourage "browser sniffing". Better to make your site correct, anyway.

If your site obeys the standards and a browser messes it up, you can count on the now-competitive browser developers to fix it. If your site is incorrect, expect complaints.

So, how do you make sure your site is valid HTML and not simply cut-and-pasted, "looks fine to me" HTML?

Liam Quinn's WDG HTML Validator is written in the nearly-ubiquitous Perl and runs as a CGI script, so you can install it on one system and use it from anywhere. You don't need to install it on your production web server, any system on the Net will do.

You can try out the Validator on the WDG site, but if you have a lot of pages to fix, it's faster and more polite to install it at your own site. WDG also has a nifty set of HTML tag reference pages, linked to and from the Validator results, that help you understand and fix your mistakes. I installed it in minutes from the Debian packages; RPMs also are available.

How Well Does It Work?

I dropped in the URL of my fresh, clean personal home page, created with stylesheets in my best attempt at HTML 4.01 Strict, and foolishly expected it to validate cleanly. No way. The Validator started complaining beginning at the <body> tag.

Error: there is no attribute BGCOLOR for this element.

What? I've been putting bgcolor in body tags almost as long as I've been writing HTML! Time to hit the book, Dynamic HTML: The Definitive Reference, 2nd Edition and see what's up. Aha! This attribute is deprecated in HTML 4.01, and I'm using "strict" DTD, so it's time to move bgcolor to the stylesheet where it belongs. It's not a big thing, but it makes the actual page a little smaller and lets me change all the colors in one place.

body {
    background-color: #aaaaaa;

Now, next time Talk Like a Pirate Day rolls around, I can change everything to white text on a black background with a single edit to the stylesheet, then concentrate on me prose, mateys.

But what's this? My page has a link to to easily catch up on the Linux-related news. But the Validator says:

Error: general entity scoring not defined and no default entity

Fortunately, that's in the Common Problems section. Time to replace that ampersand in the link with an &amp; entity. Here's another one:

Error: element NOBR undefined

I hit the book, and it turns out the <nobr> tag was never standardized at all; it's "folk HTML" that browsers happen to recognize. In this case, I'll delete the tags, chill and let the browser flow the text the way it wants.

The next item on the list of HTML mistakes occurred on line 129:

Error: end tag for TT omitted, but its declaration does not permit this

followed by this on line 131:

Error: end tag for element TT which is not open

Aha! This plainly is sloppy HTML. I had a <tt> started inside a <p>, but the </tt> was after the </p>. It looks fine in the browser I use, but this kind of mistake is exactly the kind of error that makes different browsers react differently. Remember, most of the significant differences among browsers are in how they react to mistakes and not how they deal with correct HTML. Before you start sniffing User-Agent and such ugliness, make sure your pages are standard.

Next, in a line with a <blockquote> tag, there's

Error: character data is not allowed here

Checking the reference page linked to from the Validator results, here's the problem: "The content of the BLOCKQUOTE element should be contained within other block-level elements, typically P." Time to make sure that instead of using <blockquote>, I'm using <blockquote><p>.

After a few more errors, the process gets tedious. Why didn't I validate this thing to start with and fix errors one at a time? Why did I write a quick-and-dirty conversion script that wasn't careful about matching <p> and </p>? A quick detour to a friend's page shows his first error on line 1. Ha! It's not only me.

All along, though, the Validator output makes it easy to track down the problems. Using Mozilla tabs, I can pop between the page in the browser and the Validator results.

Finally, the "Congratulations, no errors!" message appeared. Fixing a personal home page is a tiny amount of work compared to repairing damage in a deep, automatically-generated site. When writing software to crank out HTML automatically, it's worth the extra time to feed the output through the Validator to make sure it's right from the ground up, instead of having to troubleshoot when a new browser, or new version, comes along.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: Fixing HTML with the WDG HTML Validator

Anonymous's picture

Then you check your stylesheet and find you should not have :

body {

background-color: #aaaaaa;


but something like :

body {

background: #aaaaaa; color: white ;


just never ends....

Re: Fixing HTML with the WDG HTML Validator

Anonymous's picture

What about Mozilla? Isn't Mozilla Standardized enough for you? Let's all dump our GNOME installs, and only use KDE and Konquerer because DonMarti says we need to "All Hail KHTML".

What's that? Mozilla is too big? Use Phoenix then. Or Galeon.

DonMarti, stop polluting the world with KDE Propaganda, and let us have our freedoms of choice. If I want to use KHTML based browsers, then I have the choice to, except I don't choose to use it.

Re: Fixing HTML with the WDG HTML Validator

Anonymous's picture

Dude... Chill.

Re: Fixing HTML with the WDG HTML Validator

Anonymous's picture

>DonMarti, stop polluting the world with KDE Propaganda,

>and let us have our freedoms of choice. If I want to use KHTML

>based browsers, then I have the choice to, except

>I don't choose to use it.

Okay, then don't use it. You still have the choice, for goodness' sake.

completely missing the point.

xtifr's picture

I don't use KDE (or GNOME), I have little or no direct interest in KHTML, but nevertheless, I think it was a good headline, and I think you're missing the point when you complain about it. Don isn't saying that everyone should switch to KDE. He's saying that because Apple is now using KHTML, instead of IE, that means that the decision to support only IE on your website is even more of a bad idea than it once once. The fact that KHTML is about to become very popular is good for all of us -- I'm a Mozilla user myself, and as a Mozilla user, I'd like to join Don in saying "all hail KHTML!" :)

Re: Fixing HTML with the WDG HTML Validator

Anonymous's picture

> I hit the book, and it turns out the <nobr> tag was never

> standardized at all; it's "folk HTML" that browsers happen to

> recognize. In this case, I'll delete the tags, chill and let the

> browser flow the text the way it wants.

in fact you can use ' ' to emulate <nobr>. see also


Re: I love

Anonymous's picture

The problem with using &nbsp; is that it has to be used between all words

grouped, etc. Also, depending on the browser pushing 2 image tags side-by-side

won't necessarity prevent a line break. <nobr> is very effective here as well.

For a tag that hasn't been official in ages, <nobr> is widely supported and very

handy. I use it but never count on it.

One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix