At the Forge - HTML5

HTML5 is coming, and it's going to change the way you develop Web apps. Read on to find out how.

One of the amazing things about the Web always has been the relative ease with which you can create a site. All you have to do is learn a few HTML tags, create a file that uses those tags, and voilà, you have a one-page site. Learn another tag or two, and you now can create multipage sites that contain images, links to other pages and all sorts of other goodies. Really, it doesn't take more than a few hours to learn the basics of HTML.

The problem is that this shallow learning curve has long masked the fact that HTML's vocabulary hasn't kept up with the times. Yes, it's very easy to create a site, but when you want to start styling the site, things become a bit more complex. And, I don't mean they become complex because CSS is hard for many people to understand (which it is). Rather, things become complex because styles often are attached to span and div tags, which means that pages end up containing many span and div tags, each with its own ID and/or class, so that you can style each thing just right.

Now, there's nothing technically wrong with having dozens of div tags on a page of HTML. But, at a time when Tim Berners-Lee and others are talking about the “semantic Web”, and a growing number of computers (rather than people) are trying to retrieve and parse documents on the Web, it seems odd to stick with such a limited vocabulary.

While I'm complaining, it's hard to believe that HTML forms barely have changed since I started using them in 1993. Given the number of sites that ask you to enter e-mail addresses or dates, you would think that HTML forms would include special provisions for these types of inputs, rather than force people to use text fields for each one.

In fact, there are a whole bunch of problems with HTML in its current form, and with many of the CSS selectors that allow you to style it nicely. HTML has been a part of my life for so long, some of those problems didn't even occur to me until I started to think about them in greater depth. Fortunately for the Web though, I'm not the one in charge of such thinking. After a number of fits and starts, the HTML5 specification (and a few other related specifications, such as CSS3), which includes no small number of improvements, is beginning to gain popularity.

The good news is that HTML5 (and friends) has a great deal to offer Web developers. Indeed, I've already switched all of my new development to use HTML5, and I'm hoping to back-port applications where possible. However, HTML5 is a catchall phrase for a huge number of different tags, functions and CSS selectors—and, each browser manufacturer is implementing these piecemeal, with no guarantee of 100% compliance in the near future.

This puts Web developers in the unenviable position of being able to enjoy a great deal of new functionality, but also constantly having to check to see whether the user's browser can take advantage of that functionality. I find this to be a bit ironic. For years, many of us have touted Web and server-side applications as a way of getting around the headaches associated with the compatibility issues that plague desktop applications. Although browser compatibility always has been an issue of sorts, the problems we've seen to date pale in comparison with our current issues, in part because the HTML5 elements are more advanced than we've seen before. (The closest equivalent would be the browsers that lacked support for forms, back in the early days of the Web.) At the same time, JavaScript is now mature enough to provide us with a way to test for the presence of these features, either on our own or by using a third-party library.

This month, I'm looking at some of the promise HTML5 brings to the table, with a particular emphasis on some of the syntax, elements and attributes that are coming into play. I also explain how to use Modernizr, an open-source JavaScript library that automates testing for various HTML5 features, so you can use an alternative.

I should note one subject that has been at the center of much public discussion of HTML5 that has to do with video and audio formats. These topics certainly are important and of interest, but they also are complicated, in terms of browser compatibility and licensing issues. It's true that HTML5 simplifies the use of video in many ways, but it's still a complex issue with no truly simple resolution. If you're interested in this subject, I suggest you look at one or more of the books mentioned in the Resources section of this article.

Doctypes and Tags

If you're like me, the first thing you do when you create the first standards-compliant HTML (or XHTML) page on a new site is copy the doctype from an existing site. The doctype not only provides hints to the user's browser, indicating what it can and should expect, but it also provides a standard against which you can check the validity of your HTML. If you fail to provide a doctype, not only are you failing to hitch your wagon to any standard, but you're also telling Microsoft's browsers that they should operate in “quirks mode”, which explicitly ignores standards and will wreak havoc on your HTML and CSS.

Modern HTML declarations are long-winded and easy to get wrong, so I never type them myself, but rather copy them, either from an existing project or from someone else on the Web. Fortunately, HTML5 simplifies this enormously. Generally, you just have to put the following at the top of the page:

<!DOCTYPE html>

After the doctype, the HTML document looks and works much like you might expect. For example, a document has a <head> section, typically containing the document title, links to stylesheets, metatags and imported JavaScript libraries. Perhaps the most common metatag you will use is the one determining the character encoding; nowadays, just about everyone should use UTF-8, which you can specify as:

<meta charset="utf-8" />

Following the <head> section is the <body> section, which contains the page's actual content. Tags continue to work as they did before, but the rules are somewhat relaxed. For example, you no longer need to quote attribute values that lack whitespace (although I think it's a good idea to do so), and you can omit the self-closing trailing slash on such tags as <img>.

If you are tired of using <div> to divide up your page, and use an “id” attribute of “header”, “footer” or “sidebar”, cheer up, you're not alone. Google apparently did some statistical analysis of Web pages and determined that a huge number of sites use divs to set up their headers and footers, among other things. In order to make HTML5 more semantically expressive, the specification includes a number of new sectional tags, such as <section>, <article>, <header> and <footer>. You even can indicate that a particular set of links are for navigation, such as a menu bar, by putting them inside a <nav> tag. Note that these new tags don't change anything other than the semantics, as well as the ability to style them by tag, rather than by ID. Although this won't necessarily change the technical layout of pages, it will make them easier to read and understand, and it also will make it easier for search engines to parse and deal with documents without having to look at IDs and classes, which are arbitrary in any event.