Semantic Web Publishing with RDFa

 in
How you can use RDFa to embed structured content into your Web page and be part of the Semantic Web.
Web 3.0: the Semantic Web

Now, it's time to back up a bit. The term Semantic Web is used in this article to refer to the goal of a machine-parsable Web of structured data, as envisioned by Tim Berners-Lee in his 2001 Scientific American article by that name. Although there still is plenty of spirited debate over exactly how Web 3.0 will take shape, the W3 folks and others have been working diligently on a core set of technologies that has started to gain serious traction in the wild. Check out the layercake diagram from the W3C (Figure 1).

Figure 1. Layercake Diagram from W3C (source: www.w3.org/2001/sw/layerCake.png)

RDF, the data model on which the whole thing is based, represents the world as a set of triples: subject, predicate and object. Each item in the triple can be a URI, a literal or a blank node (a kind of temporary variable). In practice, the predicate is likely to be a URI in a namespace created for the purpose, like cal:dtstart or cal:summary.

Vocabularies and ontologies form the backbone of the Semantic Web. You can define your own, and some tools like Semantic MediaWiki create an ontology for you automatically. When defining the terms in a specialized domain, or when creating a private within-enterprise application, creating your own ontology makes sense. For sharing data with the world, I prefer to reuse existing vocabularies as much as possible. (By vocabulary, I mean an RDF file that defines terms and properties; by ontology, I mean a vocabulary that also contains logical rules.) Some widely used vocabs include the following:

Note that in our document, we can choose our own shorthand name for each vocabulary when we list it in the <html> tag. Then, we can use that shorthand to write what is called a CURIE, or Compact URI, like dc:title or foaf:name. In RDFa, those CURIEs are valid URIs and are much easier to read once you get used to them. One of the core ideas of RDF is to be able to use URIs to refer to concepts and things outside cyberspace, and then use them to make logical statements. So, it helps if the URIs are human-readable.

Going back to the rodeo schedule example, suppose we want to list the contestants in each event. Now, we get into the power of RDFa—the ability to connect different types of data together in a logical way right in an HTML file. The first step is to pick or create a vocabulary to describe the contestants. FOAF is the standard for referring to people, but I also want to specify that they are contestants in the rodeo. I did a search on Swoogle for the word contestant, and after a few minutes examining the available ontologies, I decided that http://smartweb.semanticweb.org/ontology/sportevent is the most apt. I also want to add a contact person for the rodeo as a whole, using the vCard vocabulary. So, I added foaf, contact and sportevent vocabularies to the list at the start of the document, which now looks like this:


<html xmlns="http://www.w3.org/1999/xhtml"
 
 
 xmlns:foaf=
 "http://xmlns.com/foaf/spec/20071002.rdf"
 xmlns:contact=
 "http://www.w3.org/2001/vcard-rdf/3.0#"
>

Zooming in on just the event itself, we can add some contestants:


<div rel="cal:Vevent">
  <span property="cal:dtstart" content="20080222T1400-0700">2:00PM</span>
  :
  <span property="cal:summary">Bull Riding</span>

  <ul>List of Contestants:
  <li rel="sportevent:Contestant" id="Marchi">
    <span property="foaf:name" about="#Marchi"
      >Guilherme Marchi</span><br/>
    <a rel="foaf:weblog" about="#Marchi"
      href="http://example.com/~Marchi"
    >Marchi's blog</a>
  </li>
  <li rel="sportevent:Contestant" id="Briscoe">
     <span property="foaf:name" about="#Briscoe">Travis Briscoe</span>
  </li>
  </ul>
</div>

And, at the bottom of the page, we add a footer with general contact information:


<p class="footer" about="/main/page/for/Rodeo">
  For general information or event questions, please call
  <span property="contact:phone">800-555-1212</phone>
  or email
  <a rel="contact:email" href="mailto:rodeo-info@example.com"
  >rodeo-info@example.com</a>
</p>

RDFa uses several existing HTML properties and creates a few new ones. Recall that an RDF statement has three parts: subject, predicate and object. The about= or instanceOf= property of a tag can specify the subject. The rel=, rev= or property= property specifies the predicate. Then, the object may be the href=, content= or actual content enclosed by the tag pair. Note that the subject may be in a parent tag and, if missing, defaults to the document itself. Refer to the RDFa Syntax Specification and Primer documents for a detailed explanation of all the ways that RDF can be embedded in HTML.

Re-verifying through the RDFa Distiller returns the necessary @prefix lines to specify the vocabularies, followed by the N3:


@prefix cal: <http://www.w3.org/2002/12/cal/ical#>
(...all the other prefixes...)

<http://abra.info/lj/rodeo.xhtml> cal:Vevent
 [ sportevent:Contestant
    <http://abra.info/lj/rodeo.xhtml#Briscoe>,
    <http://abra.info/lj/rodeo.xhtml#Marchi>;

    cal:dtstart "20080222T1400-0700";
    cal:summary "Bull Riding"
 ].

 <http://abra.info/main/page/for/Rodeo>
    contact:email <mailto:rodeo-info@example.com>;
    contact:phone "800-555-1212".

 <http://abra.info/lj/rodeo.xhtml#Briscoe>
  foaf:name "Travis Briscoe".

 <http://abra.info/lj/rodeo.xhtml#Marchi>
  foaf:name "Guilherme Marchi";
  foaf:weblog <http://example.com/~Marchi>.

It's just like that. Well, that's not exactly how it went. The RDFa Distiller fails tersely on less-than-valid XHTML, which means that one mismatched tag or missing quotation mark causes unexplained failure. So, what I really did was use the user-friendly W3 Validator service first, at validator.w3.org, which reminded me about some missing tags and also to save my example as .xhtml so it would be returned with the correct MIME type. After passing the validator, I renamed the file and ran it back through the RDFa Distiller to generate the above N3 output. (The Distiller also has some caching issues. It was designed as a check of the syntax specification, not as a user tool. I use it anyway because I like the N3 output format.)

Another useful tool for checking your triple logic is the GetN3 bookmarklet available from www.w3.org/2006/07/SWD/RDFa/impl/js. Once you've saved it as a bookmark, you can use it to extract the RDFa quickly as N3 of any page you have in the browser. It's also more forgiving than the Distiller, so you can use it as a quick logic check without worrying about valid XHTML.

______________________

Geek Guide
The DevOps Toolbox

Tools and Technologies for Scale and Reliability
by Linux Journal Editor Bill Childers

Get your free copy today

Sponsored by IBM

Upcoming Webinar
8 Signs You're Beyond Cron

Scheduling Crontabs With an Enterprise Scheduler
11am CDT, April 29th
Moderated by Linux Journal Contributor Mike Diehl

Sign up now

Sponsored by Skybot