Public and System Identifiers

SGML was designed to not have system-dependencies; therefore, even a way around using file names was found. SGML talks about “external entities” which can be identified in two ways: by a public identifier or a system identifier, where the first is generally preferred because it is system independent. Public identifiers are known to everyone who has edited HTML. The line:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2
Draft//EN">

says: “this is an `HTML' document and you'll be able to find the specs via the public identifier `-//W3C//DTD HTML 3.2 Draft//EN”'. The public identifier can be resolved into SGML in any number of ways: through databases, file systems, networks or whatever the SGML system at hand implements.

A standard way to map public identifiers to system identifiers is by means of SGML Open catalogs. These are files that contain entries like:

PUBLIC "-//W3C//DTD HTML 3.2 Draft//EN"
"/usr/local/sgml/html3-2.dtd"

where the third field is the system identifier, in this case (and indeed in most cases) a file name. SGML software knows how to find these catalogs and uses them to translate public identifiers without the user having to worry about file locations. Often, a name is hard coded but may be overridden by a set of names in an environment variable SGML_CATALOG_FILES.

SGMLtools builds and uses a shared catalog in a well-known location (/var/lib/sgml/catalog) that contains all these mappings so hard-coded system identifiers are avoided as much as possible, thus making documents more portable.