Semantic Web Publishing with RDFa

 in
How you can use RDFa to embed structured content into your Web page and be part of the Semantic Web.
Web 3.0: the Semantic Web

Now, it's time to back up a bit. The term Semantic Web is used in this article to refer to the goal of a machine-parsable Web of structured data, as envisioned by Tim Berners-Lee in his 2001 Scientific American article by that name. Although there still is plenty of spirited debate over exactly how Web 3.0 will take shape, the W3 folks and others have been working diligently on a core set of technologies that has started to gain serious traction in the wild. Check out the layercake diagram from the W3C (Figure 1).

Figure 1. Layercake Diagram from W3C (source: www.w3.org/2001/sw/layerCake.png)

RDF, the data model on which the whole thing is based, represents the world as a set of triples: subject, predicate and object. Each item in the triple can be a URI, a literal or a blank node (a kind of temporary variable). In practice, the predicate is likely to be a URI in a namespace created for the purpose, like cal:dtstart or cal:summary.

Vocabularies and ontologies form the backbone of the Semantic Web. You can define your own, and some tools like Semantic MediaWiki create an ontology for you automatically. When defining the terms in a specialized domain, or when creating a private within-enterprise application, creating your own ontology makes sense. For sharing data with the world, I prefer to reuse existing vocabularies as much as possible. (By vocabulary, I mean an RDF file that defines terms and properties; by ontology, I mean a vocabulary that also contains logical rules.) Some widely used vocabs include the following:

Note that in our document, we can choose our own shorthand name for each vocabulary when we list it in the <html> tag. Then, we can use that shorthand to write what is called a CURIE, or Compact URI, like dc:title or foaf:name. In RDFa, those CURIEs are valid URIs and are much easier to read once you get used to them. One of the core ideas of RDF is to be able to use URIs to refer to concepts and things outside cyberspace, and then use them to make logical statements. So, it helps if the URIs are human-readable.

Going back to the rodeo schedule example, suppose we want to list the contestants in each event. Now, we get into the power of RDFa—the ability to connect different types of data together in a logical way right in an HTML file. The first step is to pick or create a vocabulary to describe the contestants. FOAF is the standard for referring to people, but I also want to specify that they are contestants in the rodeo. I did a search on Swoogle for the word contestant, and after a few minutes examining the available ontologies, I decided that http://smartweb.semanticweb.org/ontology/sportevent is the most apt. I also want to add a contact person for the rodeo as a whole, using the vCard vocabulary. So, I added foaf, contact and sportevent vocabularies to the list at the start of the document, which now looks like this:


<html xmlns="http://www.w3.org/1999/xhtml"
 
 
 xmlns:foaf=
 "http://xmlns.com/foaf/spec/20071002.rdf"
 xmlns:contact=
 "http://www.w3.org/2001/vcard-rdf/3.0#"
>

Zooming in on just the event itself, we can add some contestants:


<div rel="cal:Vevent">
  <span property="cal:dtstart" content="20080222T1400-0700">2:00PM</span>
  :
  <span property="cal:summary">Bull Riding</span>

  <ul>List of Contestants:
  <li rel="sportevent:Contestant" id="Marchi">
    <span property="foaf:name" about="#Marchi"
      >Guilherme Marchi</span><br/>
    <a rel="foaf:weblog" about="#Marchi"
      href="http://example.com/~Marchi"
    >Marchi's blog</a>
  </li>
  <li rel="sportevent:Contestant" id="Briscoe">
     <span property="foaf:name" about="#Briscoe">Travis Briscoe</span>
  </li>
  </ul>
</div>

And, at the bottom of the page, we add a footer with general contact information:


<p class="footer" about="/main/page/for/Rodeo">
  For general information or event questions, please call
  <span property="contact:phone">800-555-1212</phone>
  or email
  <a rel="contact:email" href="mailto:rodeo-info@example.com"
  >rodeo-info@example.com</a>
</p>

RDFa uses several existing HTML properties and creates a few new ones. Recall that an RDF statement has three parts: subject, predicate and object. The about= or instanceOf= property of a tag can specify the subject. The rel=, rev= or property= property specifies the predicate. Then, the object may be the href=, content= or actual content enclosed by the tag pair. Note that the subject may be in a parent tag and, if missing, defaults to the document itself. Refer to the RDFa Syntax Specification and Primer documents for a detailed explanation of all the ways that RDF can be embedded in HTML.

Re-verifying through the RDFa Distiller returns the necessary @prefix lines to specify the vocabularies, followed by the N3:


@prefix cal: <http://www.w3.org/2002/12/cal/ical#>
(...all the other prefixes...)

<http://abra.info/lj/rodeo.xhtml> cal:Vevent
 [ sportevent:Contestant
    <http://abra.info/lj/rodeo.xhtml#Briscoe>,
    <http://abra.info/lj/rodeo.xhtml#Marchi>;

    cal:dtstart "20080222T1400-0700";
    cal:summary "Bull Riding"
 ].

 <http://abra.info/main/page/for/Rodeo>
    contact:email <mailto:rodeo-info@example.com>;
    contact:phone "800-555-1212".

 <http://abra.info/lj/rodeo.xhtml#Briscoe>
  foaf:name "Travis Briscoe".

 <http://abra.info/lj/rodeo.xhtml#Marchi>
  foaf:name "Guilherme Marchi";
  foaf:weblog <http://example.com/~Marchi>.

It's just like that. Well, that's not exactly how it went. The RDFa Distiller fails tersely on less-than-valid XHTML, which means that one mismatched tag or missing quotation mark causes unexplained failure. So, what I really did was use the user-friendly W3 Validator service first, at validator.w3.org, which reminded me about some missing tags and also to save my example as .xhtml so it would be returned with the correct MIME type. After passing the validator, I renamed the file and ran it back through the RDFa Distiller to generate the above N3 output. (The Distiller also has some caching issues. It was designed as a check of the syntax specification, not as a user tool. I use it anyway because I like the N3 output format.)

Another useful tool for checking your triple logic is the GetN3 bookmarklet available from www.w3.org/2006/07/SWD/RDFa/impl/js. Once you've saved it as a bookmark, you can use it to extract the RDFa quickly as N3 of any page you have in the browser. It's also more forgiving than the Distiller, so you can use it as a quick logic check without worrying about valid XHTML.

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix