Semantic Web Publishing with RDFa

 in
How you can use RDFa to embed structured content into your Web page and be part of the Semantic Web.
Web 3.0: the Semantic Web

Now, it's time to back up a bit. The term Semantic Web is used in this article to refer to the goal of a machine-parsable Web of structured data, as envisioned by Tim Berners-Lee in his 2001 Scientific American article by that name. Although there still is plenty of spirited debate over exactly how Web 3.0 will take shape, the W3 folks and others have been working diligently on a core set of technologies that has started to gain serious traction in the wild. Check out the layercake diagram from the W3C (Figure 1).

Figure 1. Layercake Diagram from W3C (source: www.w3.org/2001/sw/layerCake.png)

RDF, the data model on which the whole thing is based, represents the world as a set of triples: subject, predicate and object. Each item in the triple can be a URI, a literal or a blank node (a kind of temporary variable). In practice, the predicate is likely to be a URI in a namespace created for the purpose, like cal:dtstart or cal:summary.

Vocabularies and ontologies form the backbone of the Semantic Web. You can define your own, and some tools like Semantic MediaWiki create an ontology for you automatically. When defining the terms in a specialized domain, or when creating a private within-enterprise application, creating your own ontology makes sense. For sharing data with the world, I prefer to reuse existing vocabularies as much as possible. (By vocabulary, I mean an RDF file that defines terms and properties; by ontology, I mean a vocabulary that also contains logical rules.) Some widely used vocabs include the following:

Note that in our document, we can choose our own shorthand name for each vocabulary when we list it in the <html> tag. Then, we can use that shorthand to write what is called a CURIE, or Compact URI, like dc:title or foaf:name. In RDFa, those CURIEs are valid URIs and are much easier to read once you get used to them. One of the core ideas of RDF is to be able to use URIs to refer to concepts and things outside cyberspace, and then use them to make logical statements. So, it helps if the URIs are human-readable.

Going back to the rodeo schedule example, suppose we want to list the contestants in each event. Now, we get into the power of RDFa—the ability to connect different types of data together in a logical way right in an HTML file. The first step is to pick or create a vocabulary to describe the contestants. FOAF is the standard for referring to people, but I also want to specify that they are contestants in the rodeo. I did a search on Swoogle for the word contestant, and after a few minutes examining the available ontologies, I decided that http://smartweb.semanticweb.org/ontology/sportevent is the most apt. I also want to add a contact person for the rodeo as a whole, using the vCard vocabulary. So, I added foaf, contact and sportevent vocabularies to the list at the start of the document, which now looks like this:


<html xmlns="http://www.w3.org/1999/xhtml"
 
 
 xmlns:foaf=
 "http://xmlns.com/foaf/spec/20071002.rdf"
 xmlns:contact=
 "http://www.w3.org/2001/vcard-rdf/3.0#"
>

Zooming in on just the event itself, we can add some contestants:


<div rel="cal:Vevent">
  <span property="cal:dtstart" content="20080222T1400-0700">2:00PM</span>
  :
  <span property="cal:summary">Bull Riding</span>

  <ul>List of Contestants:
  <li rel="sportevent:Contestant" id="Marchi">
    <span property="foaf:name" about="#Marchi"
      >Guilherme Marchi</span><br/>
    <a rel="foaf:weblog" about="#Marchi"
      href="http://example.com/~Marchi"
    >Marchi's blog</a>
  </li>
  <li rel="sportevent:Contestant" id="Briscoe">
     <span property="foaf:name" about="#Briscoe">Travis Briscoe</span>
  </li>
  </ul>
</div>

And, at the bottom of the page, we add a footer with general contact information:


<p class="footer" about="/main/page/for/Rodeo">
  For general information or event questions, please call
  <span property="contact:phone">800-555-1212</phone>
  or email
  <a rel="contact:email" href="mailto:rodeo-info@example.com"
  >rodeo-info@example.com</a>
</p>

RDFa uses several existing HTML properties and creates a few new ones. Recall that an RDF statement has three parts: subject, predicate and object. The about= or instanceOf= property of a tag can specify the subject. The rel=, rev= or property= property specifies the predicate. Then, the object may be the href=, content= or actual content enclosed by the tag pair. Note that the subject may be in a parent tag and, if missing, defaults to the document itself. Refer to the RDFa Syntax Specification and Primer documents for a detailed explanation of all the ways that RDF can be embedded in HTML.

Re-verifying through the RDFa Distiller returns the necessary @prefix lines to specify the vocabularies, followed by the N3:


@prefix cal: <http://www.w3.org/2002/12/cal/ical#>
(...all the other prefixes...)

<http://abra.info/lj/rodeo.xhtml> cal:Vevent
 [ sportevent:Contestant
    <http://abra.info/lj/rodeo.xhtml#Briscoe>,
    <http://abra.info/lj/rodeo.xhtml#Marchi>;

    cal:dtstart "20080222T1400-0700";
    cal:summary "Bull Riding"
 ].

 <http://abra.info/main/page/for/Rodeo>
    contact:email <mailto:rodeo-info@example.com>;
    contact:phone "800-555-1212".

 <http://abra.info/lj/rodeo.xhtml#Briscoe>
  foaf:name "Travis Briscoe".

 <http://abra.info/lj/rodeo.xhtml#Marchi>
  foaf:name "Guilherme Marchi";
  foaf:weblog <http://example.com/~Marchi>.

It's just like that. Well, that's not exactly how it went. The RDFa Distiller fails tersely on less-than-valid XHTML, which means that one mismatched tag or missing quotation mark causes unexplained failure. So, what I really did was use the user-friendly W3 Validator service first, at validator.w3.org, which reminded me about some missing tags and also to save my example as .xhtml so it would be returned with the correct MIME type. After passing the validator, I renamed the file and ran it back through the RDFa Distiller to generate the above N3 output. (The Distiller also has some caching issues. It was designed as a check of the syntax specification, not as a user tool. I use it anyway because I like the N3 output format.)

Another useful tool for checking your triple logic is the GetN3 bookmarklet available from www.w3.org/2006/07/SWD/RDFa/impl/js. Once you've saved it as a bookmark, you can use it to extract the RDFa quickly as N3 of any page you have in the browser. It's also more forgiving than the Distiller, so you can use it as a quick logic check without worrying about valid XHTML.

______________________

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState