At the Forge - Checking Your HTML
One of the best tools for checking the validity of a page's markup is the World Wide Web Consortium's validator, available at validator.w3.org. I use the validator almost exclusively from within Firefox, into which I have installed the Web Developer plugin. This plugin lets you validate the HTML of any page, simply by selecting Validate HTML from the browser. The browser submits the page's URL to the W3C validator, which then gives a line-by-line indication of what problems (if any) the page contains.
The W3C validator has at least two problems, however. First, it requires that you submit each page, one at a time, to the validator program. This means a great deal of time and effort, just to check your pages. A second consideration is more practical; the validator works only with pages that are accessible via the Internet, without password protection. If your site is being developed on your local computer, and if you have a firewall protecting your business from the outside world, you probably will be unable to use the validator via the Web.
One solution to this problem is to install the W3C validator on your local computer. You can get the source code from validator.w3.org/source, which comes in the form of a Perl program. On modern Debian and Ubuntu machines, you can install w3c-markup-validator, which makes it available via your local Web server, ready to be invoked.
If you end up installing the validator manually, it requires a number of modules, which you might need to download from CPAN (Comprehensive Perl Archive Network), a large number of mirrors containing open-source Perl modules. It might take some trial and error to figure out which modules are necessary, although if you are an experienced user of the CPAN.pm installer, this shouldn't be too much trouble. Note that the SGML::Parser::OpenSP module requires the OpenSP parser, which you can get from SourceForge at openjade.sf.net.
As you might be able to tell, a number of these modules are required in order to handle alternate encoding schemes, particularly those for Asian languages. Even if you aren't planning to handle such languages, the modules are mandatory and must be installed.
The validator program, called check, should be put in a directory for CGI programs or in a directory handled by mod_perl, the Apache plugin that lets you run Perl programs at a higher speed, among other things. You also will need to install a configuration file, typically placed in the directory /etc/w3c, but which you can relocate by setting the W3C_VALIDATOR_CFG environment variable.
Now that you have the W3C checker installed on your own server, you can feed it URLs that aren't open to the public. But, if you are developing an application in Ruby on Rails, you can go one step better than this, integrating the W3C validator into your automated testing.
In order to do this, you need to install the html_test plugin for Rails. Go into your Rails application's root directory, and type:
script/plugin install ↪http://htmltest.googlecode.com/svn/trunk/html_test
With this plugin in place, you now can use three new assertions in your functional and integration tests: assert_w3c returns true if the W3C validator approves of your HTML; assert_tidy returns true if you're using the HTML Tidy library, described below; and, assert_validates calls both of these.
So, if you have a FAQ page you want to check with an integration test, you can write something like this:
def test_faq get '/faq' assert_response :success assert_w3c end
If the HTML for this page is approved by the W3C validator, everything is fine. If this page is not valid, you will get quite a bit of output, which you should redirect to a file. This file will contain not only the results of your tests, but also the same HTML output that you would have gotten from the public, Web-based W3C validator. This means you'll get a complete and easy-to-read description of what you did wrong.
You'll often discover that a large number of validation errors can be fixed with a small number of corrections. For example, when I ran this test against a sloppy FAQ page, I got six validation errors. I was able to fix all of them by indicating the appropriate namespace in my <html> tag and removing an extraneous </p> from the end of the file.
Checking HTML validity in this way is nice and easy. (It can be time consuming, however, to invoke the validator on every single page; I think the trade-off is worthwhile, but you might disagree.) If you always want to check HTML validity, you can change your test environment's configuration somewhat, so that it'll happen automatically, without having to invoke assert_w3c each time.
To do this, you need to modify test_helper.rb, which sits at the top of the test directory, and which is included into every test program. All you have to do is add:
ApplicationController.validate_all = true ApplicationController.validators = [:w3c]
You also can check the validity of URLs and redirects; although these aren't checking HTML validity per se, they do come with the html_test plugin and are quite useful:
ApplicationController.check_urls = true ApplicationController.check_redirects = true
With these four lines in your test_helper.rb, you can run your integration tests once again. If any of the validation tests fail, you can look at /tmp/w3c_last_response.html, which will contain the complete output of that failure. This doesn't help very much if you have multiple failures, however.
If you have designed your templates using the DRY (don't repeat yourself) principle, fixing HTML markup problems shouldn't be too bad. In many cases, you will need to change only one tag in the layout to fix everything.
Free DevOps eBooks, Videos, and more!
Regardless of where you are in your DevOps process, Linux Journal can help!
We offer here the DEFINITIVE DevOps for Dummies, a mobile Application Development Primer, and advice & help from the expert sources like:
- Linux Journal
Web Development News
- Resurrecting the Armadillo
- High-Availability Storage with HA-LVM
- Real-Time Rogue Wireless Access Point Detection with the Raspberry Pi
- DNSMasq, the Pint-Sized Super Dæmon!
- Localhost DNS Cache
- March 2015 Issue of Linux Journal: System Administration
- Days Between Dates: the Counting
- The Usability of GNOME
- Linux for Astronomers
- You're the Boss with UBOS