More Flexible Formatting with SGMLtools
In the October 1995 issue of LJ, Christian Schwarz presented a short overview of Linuxdoc-SGML as it stood then: a complete, out-of-the-box package that gave and still gives authors a chance to write once and present anywhere. From flat ASCII to typeset PostScript and hypertext HTML, it all rolls out from a single SGML source file. Since then, lots of smaller and bigger changes have resulted in renaming it SGML-Tools (and then SGMLtools—the hyphen caused confusion) to indicate it wasn't just for Linux anymore. Still, we, the SGMLtools project authors, weren't satisfied with this, so we set out to build an even better package that is presented here, SGMLtools 2. This article will give a brief overview of what happened to SGML-Tools 1 that led us to rename it SGMLtools 2; more extensive information can be found on the SGMLtools web site (see Resources).
A big issue that came up again and again was the fact that the shortcomings of the Linux document type definition were beginning to show. Document type definition (DTD) is the SGML term for the set of rules that fixes how an SGML document that is compliant with DTD must look. It outlines the structure of the document from titles and subtitles to tables; everything is defined.
Maintaining a document type definition, as we found out, is quite difficult. Constant discussion took place over which features should be allowed, how to make existing features better, whether to stick with pure procedural markup or be a little bit pragmatic about things. Endless rounds of talks came up and came back and began to interfere with progress. The Linuxdoc DTD was clearly too limited, but we didn't want to redesign it without finding out whether alternatives already existed.
We quickly came to the conclusion that the DocBook DTD, as developed by the Davenport Group, would be a good successor to the Linuxdoc DTD. DocBook, being developed by professionals for professionals with an emphasis towards technical documentation, fits the target audience for SGMLtools very well and solves a number of the problems of Linuxdoc. Furthermore, almost every SGML vendor supports DocBook, so this would make users less dependent on us and give them more ways to process SGML documentation. Recently, responsibility for maintaining DocBook has been transferred to the Organisation for the Advancement of Structured Information Standards (http://www.oasis-open.org/), ensuring that DocBook will continue to be widely supported.
The acronym DSSSL may not say much to the average reader, but it stands for another significant change in SGMLtools. DSSSL (Document Style and Semantics Specification Language) is a language used to specify how SGML documents will look. It helps in translating procedural markup such as “section” to a certain formatting style like “Helvetica Bold, 18 points”, building up tables of contents and more. It is much more powerful than the mapping files used previously, because it can act on context and allows you to define functions. As DSSSL is based on Scheme, you can do just about anything you wish.
We chose to use DSSSL not only because of its power, but also because it is an industry standard (contrary to the old method and to alternatives we evaluated). Also, it helped us jump-start the project because a complete set of DSSSL styles for the DocBook DTD is available.
SGMLtools 2 is a collection of tools based around three core elements:
the DocBook DTD
the standard DocBook DSSSL files
Jade, the SGML/DSSSL parser
When you hand your SGML source to SGMLtools (with the command sgmltools), it basically does nothing but call Jade with the name of the SGML file, the name of the DSSSL file to apply to it and the requested output format. The following sections go into some detail in order to make the process clear. It is not difficult to understand, and it helps a great deal when you want to make modifications to have some basic knowledge of what happens during a run of SGMLtools.
Jade first reads the SGML file and tries to find the document type definition from the SGML file's declaration at the beginning of the file. For example:
<!DOCTYPE article PUBLIC "-//Davenport//DTD DocBook V3.0//EN">
appears at the beginning of a DocBook-compliant document. (Note that article can refer to any part of the DocBook DTD, and para can be used to designate a single-paragraph document.) From the PUBLIC identifier, Jade obtains the file name of the DTD definition (see the sidebar on Public and System Identifiers), and if all this succeeds, the SGML source is checked for compliance.
After the document has been found to be okay (“validated”), Jade reads the indicated DSSSL file and executes it against the parsed SGML file. The DSSSL “program” reads the SGML document from objects in memory and outputs another memory structure called a Flow Object Tree (FOT). The FOT will look structurally like the SGML document, but it contains information on fonts, sizes, and other options. Finally, Jade hands the FOT to one of its backends which converts the generic-style information into the backend's specific file format.
As a short example to illustrate this process, start with an SGML document with the line:
<Sect1><Title>Introduction</Title> ...
This is a top-level section with “Introduction” as the title. Jade determines it is a valid DocBook document by reading a DSSSL file, perhaps ldp.dsl which gives instructions for Linux Documentation Project style formatting.
The following section could be in the DSSSL file:
(element SECT1 TITLE ((make paragraph
font-family-name: "Times New Roman"
font-weight: 'bold
font-size: 20pt))
This expression says “for TITLE elements within SECT1 elements, output a paragraph with a 20pt bold Times font”. Taking some shortcuts, we can say that this expression results in a flow object with the given properties and the text “Introduction” for content (the concept of making a paragraph out of everything, even headings, will be familiar to people who have worked with DTP [distributed transaction processing] software). When everything is done, Jade hands all the flow objects to the backend, for example, TeX. This backend, upon encountering the flow object for our introductory section title, will output something like:
{\setfontfam{Times-Roman-Bold}\setfontsize{20pt}Introduction}
which can then be processed by TeX and a special TeX package to
generate DVI and PostScript.
Note that the beauty of DSSSL is that you talk only about style, not about specific instructions for specific formats. Whether TeX, RTF or groff, you'll always get at least a close equivalent of a “20pt Times New Roman Bold” section header. If you need to tune this, you can easily override pieces of DSSSL specifications for specific backends. Often, you'll at least have different DSSSL files for hardcopy and HTML output.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
| Speed Up Your Web Site with Varnish | Jun 19, 2013 |
| Non-Linux FOSS: libnotify, OS X Style | Jun 18, 2013 |
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
- Speed Up Your Web Site with Varnish
- Containers—Not Virtual Machines—Are the Future Cloud
- Linux Systems Administrator
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Senior Perl Developer
- Technical Support Rep
- Non-Linux FOSS: libnotify, OS X Style
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- RSS Feeds
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




2 hours 10 min ago
2 hours 27 min ago
3 hours 43 min ago
4 hours 32 min ago
4 hours 35 min ago
4 hours 44 min ago
5 hours 13 min ago
7 hours 39 min ago
11 hours 39 min ago
12 hours 55 min ago