Structuring XML Documents

Author: David Megginson
Publisher: Prentice Hall
E-mail: info@prenhall.com
URL: http://www.prenhall.com/
Price: $44.95 US
ISBN: 0-13-642299-3
Reviewer: Terry Dawson
Take a close look at any of the various documentation projects operating within the Linux community, and you will find SGML. The Linux Documentation Project, the Debian Documentation Project and others are using SGML as the primary tool in producing consistently structured and styled documentation. The search for a more sophisticated replacement for HTML has led to the development of XML, which is based heavily on SGML. XML has nearly all of the power and features of SGML, but will probably be much better supported because of the web-driven market for browsers and editors. For this reason, XML will probably replace SGML in many applications.
SGML and XML both provide a means of describing the structure of a document. SGML and XML rely on definitions called DTDs, Document Type Definitions, that describe document structure.
In this book, David Megginson competently explains the process of good quality DTD design. While the title suggests it is XML-specific, it is not. SGML and XML have so many similarities that it is possible to describe both simultaneously, highlighting differences between the two where they arise. This book is another in the Charles F. Goldfarb series, and in it, Mr. Megginson describes document structuring using both SGML and XML, managing to avoid confusing the reader during the process. The book has four main parts, and includes a CD-ROM with software that implements XML parsers, and a selection of modern and popular DTDs.
Part One provides some background on XML, describes how it is different from SGML and examines five popular and useful DTDs. This chapter isn't for people with no prior SGML or XML experience and isn't designed to teach you either, but if you are familiar with at least one of them, it will assist you in learning about the other. The chapter on DTD syntax clearly illustrates the differences and similarities between the two. DTDs examined in detail are:
ISO-12083
DocBook
Text-Encoding Initiative (TEI)
MIL-STD-38784 (CALS)
HyperText Markup Language (HTML 4.0)
The first four of these are in common use and have inspired many other DTD designs. The CALS table design, for example, has been borrowed many times and used in other DTDs.
Part Two covers the principles of DTD analysis. The core material of the book begins in these chapters. They describe how to critically analyse a DTD from three important perspectives: ease of learning, ease of use and ease of processing. The ease with which a particular DTD can be learned is critically important in having a DTD accepted by authors. If the DTD is difficult to learn, authors will tend not to use it, use only a small subset of it, or worse, misuse it by bending it to suit their needs. Mr. Megginson describes how to analyse the ease of learning of a DTD with the aim of instructing you how to design easy-to-learn DTDs.
The chapter entitled “Ease of Use” describes how to analyse a DTD to determine if it will be easy for authors to use when they are writing their documentation. Some of the issues explored are the naming of tags and attributes, when to use a new tag and when to add an attribute to an existing tag, and structural issues that can simplify or complicate an author's job.
The chapter on ease of processing is of particular interest to those who publish and develop processing tools. A DTD may be easy for the author to learn and use, but this doesn't always translate into something that is easy to process into printed or published form. The lessons are mostly common sense applied to the specific task of DTD design.
The third part of the book covers a number of advanced DTD maintenance and design issues. It will be of interest mostly to people who intend to use SGML or XML for purposes other than publishing, such as database systems or other information management applications. The first topic covered is that of DTD compatibility. I mentioned earlier that the CALS table design had been borrowed for use in other DTDs. When DTDs are similar, it is fairly simple to translate a document from one DTD to another. This is very useful if you wish to exchange documentation with a group which has a different DTD. This chapter describes how to identify compatibility and the advantages of keeping compatibility in mind when designing a DTD.
The second topic extends this discussion to exchanging document fragments. A document fragment might be a single chapter or paragraph from a book. If you wish to share portions of a document with a group using a different DTD, you will find useful tips in this chapter on ways to simplify the task.
The final topic in this section is DTD customisation. DTD customisation is the process of taking an existing DTD and modifying it to suit your specific purposes. Designing a sophisticated DTD can be a complex task. Often, there is little reason to design a DTD from scratch; an existing DTD may provide 95% of what you need, requiring only a small amount of customisation to fully meet your needs. This can save a lot of time and provides advantages in terms of document exchange and compatibility. This chapter describes how to customise DTDs, and how to design DTDs that are easy to customise. The DocBook DTD, for example, was designed with hooks in place that allow for easy customisation.
The fourth and final part of the book covers DTD design using a technique called Architectural Forms. Architectural Forms allow DTD designers to specify the method by which their DTD should be translated into one or more other DTDs. Architectural Forms allow you to write documents which are simultaneously valid for a number of different DTDs. This section of the book describes the concepts and the implementation of Architectural Forms and offers useful hints and advice to designers wishing to use this facility. I found this part of the book a little difficult to comprehend, but that was almost certainly due to my limited exposure to applications requiring use of this advanced technique. I'm confident that anyone with an application for Architectural Forms will find the information presented to be a good introduction to the topic.
I am pleased to report that the CD-ROM includes Linux versions of the XML parsing software. Two XML parsers are provided. The first is a precompiled version of the popular “SP” parser. The second is a Java-based XML parser called “Aelfred”. Each DTD described in the book is included in its SGML form, as well as a number of links to useful resources on the Internet. The CD-ROM provides some HTML-based documentation, but is otherwise not well documented. I am left with the impression that the CD-ROM was a last-minute addition to the book; nevertheless, it does provide tools to allow the reader to experiment with the techniques described, and to that end it is adequate.
I found Structuring XML Documents to be an interesting and informative book that I will certainly be using as a reference in the future. David Megginson has done a nice job of concisely capturing a lot of material while keeping the pace slow enough to allow one to absorb the information fairly comfortably. The book is ideal for both SGML or XML designers, and SGML designers should not be misled by the title. I recommend that anyone with an interest in DTD design, especially those involved with Linux-related documentation projects, take a look at this book. It is certain to be of assistance in your efforts.

Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- RSS Feeds
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Home, My Backup Data Center
- A Topic for Discussion - Open Source Feature-Richness?
- What's the tweeting protocol?
- Dart: a New Web Programming Experience
- Developer Poll
- Trying to Tame the Tablet
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.




27 min 18 sec ago
2 hours 59 min ago
4 hours 17 min ago
4 hours 52 min ago
5 hours 14 min ago
10 hours 2 min ago
10 hours 49 min ago
12 hours 23 min ago
14 hours 15 sec ago
15 hours 57 min ago