Reuven introduces XMLC, part of the Enhydra application server.

Over the last few months, we have looked at a variety of methods for creating web applications using server-side Java. We started with simple servlets and then moved onto JavaServer Pages (JSPs). In order to remove Java code from our JSPs, we began to use JavaBeans, objects whose methods are automatically available to our pages.

But you can only go so far with JavaBeans, which is where custom actions come in. These actions, which look like XML tags and attributes in our JSPs, are tied to the methods of a Java class. In other words, placing a tag in our JSP can effectively invoke one or more methods. Combining custom tags with beans allows us to remove quite a bit of the Java code from our JSPs.

But in the end, what have we accomplished? As we saw last month, intelligent use of custom actions means creating our own mini-language, with its own loops, conditionals and variables. Writing our own tags saves graphic designers from having to use Java and allows us a greater separation between form and content. But it does not go nearly far enough in solving problems.

One clever solution is part of the Enhydra application server, about which I will be writing over the next few months. XMLC, or the XML compiler, turns XML files (including HTML and XHTML files) into Java objects. By invoking methods on these objects, we can modify the HTML that is eventually produced.


XML, as you have probably heard by now, is the extensible markup language. What began as a simple and small standard several years ago has ballooned into a veritable alphabet soup of standards and proposed standards.

But the core of XML has remained the same, allowing people to create their own markup languages using a uniform syntax. XML is not meant to be used directly; rather, it is meant to let you create your own markup languages. Because those markup languages are based on XML, they have a well-understood syntax that can be verified by any XML parser. Moreover, if you define a data type definition (DTD) for your markup language, a verifying parser can ensure that the elements and attributes are within accepted norms.

HTML and XML are both standards of the World Wide Web Consortium (W3C), have a similar syntax and are often discussed in the same breath. But in fact, HTML is just one markup language, while XML allows you to create your own languages. More significantly, HTML has a much looser syntax than XML, thanks in no small part to historical factors. The following is thus legal HTML: <img src="foo.png">.

But because every tag must be explicitly closed in XML-derived languages, this would be illegal in an XML document. Instead, we would have to say: <img src="foo.png"/>.

In order to bridge the gap between HTML and XML, the W3C has issued a recommendation known as XHTML, the XML implementation of HTML. While there are indeed various benefits to the use of XHTML, the biggest one is that XML tools will now work on our HTML documents.

Of course, this means that our XHTML documents will look a bit more formal than the HTML documents we might be used to writing. While HTML allows us to be sloppy, using <P> to separate paragraphs, XHTML is much stricter, forcing us to begin paragraphs with <P> and end them with </P>. Attributes must also appear in double quotes, which many people fail to do when working with straight HTML.

While XHTML might be a pain for humans, it actually reduces the load on programs by making the syntax more regular, and thus easier to read and write. But the biggest benefit is the fact that XHTML documents can now be treated as XML documents.


XML documents are trees, which should ring a bell for those of you who studied computer science in college. Trees are remarkably easy to work with in theory, but the practice can be a bit tricky sometimes, depending on the way in which the interface is implemented.

There are two popular and cross-platform APIs for working with XML: SAX (the Simple API for XML) is designed to work with incoming streams of XML data, allowing it to be small and efficient. The DOM (document object model), by contrast, gives us access to the entire document tree at once. This allows us to traverse and modify nodes, including adding new nodes and removing old ones. However, it also means that the entire document must be loaded into memory before we can begin to work with documents using the DOM. This makes it more powerful than SAX, but also slower and more resource-intensive.

XMLC works by converting an XML file, normally written in HTML or XHTML, into a Java class that creates and manipulates a DOM tree. You can use standard DOM methods to add, modify and remove nodes on the tree, thus changing the document that will eventually be output.

But the truly clever idea in XMLC is the use of HTML “id” attributes. When the XMLC complier sees an id attribute, it creates methods that allow us to retrieve and modify the text contained within that attribute. The site designers thus work with HTML, identifying areas of dynamic text by giving them unique identifiers. When the designers have finished with their mockup of the original HTML page, they compile it (using XMLC) into a Java class. Developers then create servlets that instantiate that class, use methods to replace the mockup text with dynamically generated content and send the document to the user's browser.

The basic idea is that the designers do not work on hybrids of text and HTML, but rather on mockups of the final output. So long as the id attributes do not change, the HTML file and servlet can evolve in parallel, with neither designers nor developers waiting for their counterparts.