XML, the eXtensible Markup Language
The Standard General Markup Language is about two decades old. SGML was originally designed for processing large documentation sets, but SGML is neither a programming language nor a text formatting language. Instead, it's a meta-language that allows defining of customized markup languages. The most famous SGML-based language today is unquestionably HTML.
Because SGML has been around for two decades, many companies offer SGML tools and products and it's firmly entrenched in many high-end document-processing applications. SGML is quite a large language; however, understanding the basics isn't very difficult. It does contain many rarely used features which are harder to understand. Implementing a full SGML parser is difficult, and this has given SGML a reputation for fearsome complexity. This reputation isn't truly deserved, but it's been enough to scare many people away from using it.
XML, then, is a stripped-down version of SGML that sacrifices some power in return for easier understanding and implementation. It's still a meta-language, but many of SGML's lesser-used features and options have been dropped. The XML 1.0 specification is about 40 pages long, and a parser can be implemented with a few weeks of effort.
A mark-up language specified using XML looks a lot like HTML:
<?xml version="1.0"?> <!DOCTYPE myth SYSTEM "myth.dtd"> <myth> <name lang="latin">Hercules</name> <name lang="greek">Herakles</name> <description>Son of Zeus and Alcmena.</description> <mortal/> </myth>
An XML document consists of a single element containing sub-elements which can have further sub-elements inside them. Elements are indicated by tags in the text, consisting of text within angle brackets <...>. Two forms of elements are available. An element may contain content between opening and closing tags, as in <name>Hercules</name>, which is a name element containing the data Hercules. This content may be text data, other XML elements or a mixture of the two. Elements can also be empty, in which case they're represented as a single tag ending with a slash, as in <mortal/>, which is an empty stop element. This is different from HTML, where empty elements such as <BR> or <IMG> aren't indicated differently from a non-empty element such as <H1>. Also unlike HTML, XML element names are case-sensitive; mortal and Mortal are two different element types.
Opening and empty tags can also contain attributes, which specify values associated with an element. For example, text such as <name lang="greek">Herakles</name>, the name element has a lang attribute with a value of “greek”. In <name lang="latin">Hercules</name>, the attribute's value is “latin”. Another difference from HTML is that quotation marks around an attribute's value are not optional.
The rules for a given XML application are specified with a Document Type Definition (DTD). The DTD carefully lists the allowed element names and how elements can be nested inside each other. The DTD also specifies the attributes which can be defined for each element, their default values, and whether they can be omitted. For example, to make a comparison with HTML, the LI element, representing an entry in a list, can occur only inside certain elements which represent lists, such as OL or UL.
The document-type definition is specified in the DOCTYPE declaration; the above document uses a DTD called “mythology” that I invented for this article. The “mythology” DTD might contain the following declarations:
<!ELEMENT myth (name+, description, mortal?)> <!ELEMENT name (#PCDATA)> <!ATTLIST name lang ( latin | greek ) "latin"> <!ELEMENT description (#PCDATA)> <!ELEMENT mortal EMPTY>
I won't go into every detail of these lines, however, lines beginning with <!ELEMENT are element declarations. They declare the element's name and what it can contain. So, the myth element must contain one or more name elements, followed by a single description element, followed by an optional mortal element. (+, * and ? have the same meanings as in regular expressions: one or more, zero or more, and zero or one occurrence.) The mortal tag, on the other hand, must always be empty.
The third line declares the name element to have an attribute named lang; this attribute can have either of the two values “latin” or “greek” and defaults to “latin” if it's not specified.
A validating parser can be given a DTD and a document in order to verify that a given document is valid, i.e., it follows all the DTD's rules. This is quite different from HTML, since web browsers have historically had very forgiving parsers, and so relatively few people make any effort to write valid HTML. This looseness means that code to render HTML text is full of hacks and special cases; hopefully, XML won't fall into the same trap of leniency.
This article doesn't cover all of XML's features—I haven't discussed all the possible attribute types, what entities are or that XML uses Unicode, which enables XML processors to handle data written in practically any alphabet. For the full details of XML's syntax, the one definitive source is the XML 1.0 specification, available on the Web at the World Wide Web Consortium's XML page (see Resources). However, like all specifications, it's quite formal and not intended to be a friendly introduction or a tutorial. Gentler introductions are beginning to appear on the Web and on bookstore shelves.
|Speed Up Your Web Site with Varnish||Jun 19, 2013|
|Non-Linux FOSS: libnotify, OS X Style||Jun 18, 2013|
|Containers—Not Virtual Machines—Are the Future Cloud||Jun 17, 2013|
|Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer||Jun 12, 2013|
|Weechat, Irssi's Little Brother||Jun 11, 2013|
|One Tail Just Isn't Enough||Jun 07, 2013|
- Speed Up Your Web Site with Varnish
- Containers—Not Virtual Machines—Are the Future Cloud
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Linux Systems Administrator
- Non-Linux FOSS: libnotify, OS X Style
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Android's Limits
- Reply to comment | Linux Journal
51 min 9 sec ago
- Yeah, user namespaces are
2 hours 7 min ago
- Cari Uang
5 hours 38 min ago
- user namespaces
8 hours 32 min ago
8 hours 58 min ago
- One advantage with VMs
11 hours 26 min ago
- about info
11 hours 59 min ago
12 hours 49 sec ago
12 hours 1 min ago
12 hours 3 min ago
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?