Semantic Web Publishing with RDFa
Now, it's time to back up a bit. The term Semantic Web is used in this article to refer to the goal of a machine-parsable Web of structured data, as envisioned by Tim Berners-Lee in his 2001 Scientific American article by that name. Although there still is plenty of spirited debate over exactly how Web 3.0 will take shape, the W3 folks and others have been working diligently on a core set of technologies that has started to gain serious traction in the wild. Check out the layercake diagram from the W3C (Figure 1).
RDF, the data model on which the whole thing is based, represents the world as a set of triples: subject, predicate and object. Each item in the triple can be a URI, a literal or a blank node (a kind of temporary variable). In practice, the predicate is likely to be a URI in a namespace created for the purpose, like cal:dtstart or cal:summary.
Vocabularies and ontologies form the backbone of the Semantic Web. You can define your own, and some tools like Semantic MediaWiki create an ontology for you automatically. When defining the terms in a specialized domain, or when creating a private within-enterprise application, creating your own ontology makes sense. For sharing data with the world, I prefer to reuse existing vocabularies as much as possible. (By vocabulary, I mean an RDF file that defines terms and properties; by ontology, I mean a vocabulary that also contains logical rules.) Some widely used vocabs include the following:
foaf: friend of a friend, for identifying people and other entities (xmlns.com/foaf/spec/20071002.rdf).
ical: based on the iCalendar W3 standard, for calendar and event data (www.w3.org/2002/12/cal/ical).
vcard: intended as an electronic business card, it has simple fields for contact information (www.w3.org/2001/vcard-rdf/3.0).
dc: Dublin Core, defining core properties like title and creator (purl.org/dc/elements/1.1).
cc: for Creative Commons licenses (creativecommons.org/ns).
rss: the RSS 1.0 namespace (purl.org/rss/1.0).
Note that in our document, we can choose our own shorthand name for each vocabulary when we list it in the <html> tag. Then, we can use that shorthand to write what is called a CURIE, or Compact URI, like dc:title or foaf:name. In RDFa, those CURIEs are valid URIs and are much easier to read once you get used to them. One of the core ideas of RDF is to be able to use URIs to refer to concepts and things outside cyberspace, and then use them to make logical statements. So, it helps if the URIs are human-readable.
Going back to the rodeo schedule example, suppose we want to list the contestants in each event. Now, we get into the power of RDFa—the ability to connect different types of data together in a logical way right in an HTML file. The first step is to pick or create a vocabulary to describe the contestants. FOAF is the standard for referring to people, but I also want to specify that they are contestants in the rodeo. I did a search on Swoogle for the word contestant, and after a few minutes examining the available ontologies, I decided that http://smartweb.semanticweb.org/ontology/sportevent is the most apt. I also want to add a contact person for the rodeo as a whole, using the vCard vocabulary. So, I added foaf, contact and sportevent vocabularies to the list at the start of the document, which now looks like this:
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:foaf= "http://xmlns.com/foaf/spec/20071002.rdf" xmlns:contact= "http://www.w3.org/2001/vcard-rdf/3.0#" >
Zooming in on just the event itself, we can add some contestants:
<div rel="cal:Vevent"> <span property="cal:dtstart" content="20080222T1400-0700">2:00PM</span> : <span property="cal:summary">Bull Riding</span> <ul>List of Contestants: <li rel="sportevent:Contestant" id="Marchi"> <span property="foaf:name" about="#Marchi" >Guilherme Marchi</span><br/> <a rel="foaf:weblog" about="#Marchi" href="http://example.com/~Marchi" >Marchi's blog</a> </li> <li rel="sportevent:Contestant" id="Briscoe"> <span property="foaf:name" about="#Briscoe">Travis Briscoe</span> </li> </ul> </div>
And, at the bottom of the page, we add a footer with general contact information:
<p class="footer" about="/main/page/for/Rodeo"> For general information or event questions, please call <span property="contact:phone">800-555-1212</phone> or email <a rel="contact:email" href="mailto:firstname.lastname@example.org" >email@example.com</a> </p>
RDFa uses several existing HTML properties and creates a few new ones. Recall that an RDF statement has three parts: subject, predicate and object. The about= or instanceOf= property of a tag can specify the subject. The rel=, rev= or property= property specifies the predicate. Then, the object may be the href=, content= or actual content enclosed by the tag pair. Note that the subject may be in a parent tag and, if missing, defaults to the document itself. Refer to the RDFa Syntax Specification and Primer documents for a detailed explanation of all the ways that RDF can be embedded in HTML.
Re-verifying through the RDFa Distiller returns the necessary @prefix lines to specify the vocabularies, followed by the N3:
@prefix cal: <http://www.w3.org/2002/12/cal/ical#> (...all the other prefixes...) <http://abra.info/lj/rodeo.xhtml> cal:Vevent [ sportevent:Contestant <http://abra.info/lj/rodeo.xhtml#Briscoe>, <http://abra.info/lj/rodeo.xhtml#Marchi>; cal:dtstart "20080222T1400-0700"; cal:summary "Bull Riding" ]. <http://abra.info/main/page/for/Rodeo> contact:email <mailto:firstname.lastname@example.org>; contact:phone "800-555-1212". <http://abra.info/lj/rodeo.xhtml#Briscoe> foaf:name "Travis Briscoe". <http://abra.info/lj/rodeo.xhtml#Marchi> foaf:name "Guilherme Marchi"; foaf:weblog <http://example.com/~Marchi>.
It's just like that. Well, that's not exactly how it went. The RDFa Distiller fails tersely on less-than-valid XHTML, which means that one mismatched tag or missing quotation mark causes unexplained failure. So, what I really did was use the user-friendly W3 Validator service first, at validator.w3.org, which reminded me about some missing tags and also to save my example as .xhtml so it would be returned with the correct MIME type. After passing the validator, I renamed the file and ran it back through the RDFa Distiller to generate the above N3 output. (The Distiller also has some caching issues. It was designed as a check of the syntax specification, not as a user tool. I use it anyway because I like the N3 output format.)
Another useful tool for checking your triple logic is the GetN3 bookmarklet available from www.w3.org/2006/07/SWD/RDFa/impl/js. Once you've saved it as a bookmark, you can use it to extract the RDFa quickly as N3 of any page you have in the browser. It's also more forgiving than the Distiller, so you can use it as a quick logic check without worrying about valid XHTML.
|Designing Electronics with Linux||May 22, 2013|
|Dynamic DNS—an Object Lesson in Problem Solving||May 21, 2013|
|Using Salt Stack and Vagrant for Drupal Development||May 20, 2013|
|Making Linux and Android Get Along (It's Not as Hard as It Sounds)||May 16, 2013|
|Drupal Is a Framework: Why Everyone Needs to Understand This||May 15, 2013|
|Home, My Backup Data Center||May 13, 2013|
- Designing Electronics with Linux
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- Validate an E-Mail Address with PHP, the Right Way
- Tech Tip: Really Simple HTTP Server with Python
- Why Python?
- Build a Skype Server for Your Home Phone System
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Reply to comment | Linux Journal
50 min 13 sec ago
- Reply to comment | Linux Journal
1 hour 40 min ago
- Not free anymore
5 hours 42 min ago
9 hours 29 min ago
- Reply to comment | Linux Journal
9 hours 37 min ago
- Understanding the Linux Kernel
11 hours 52 min ago
14 hours 21 min ago
- Kernel Problem
1 day 24 min ago
- BASH script to log IPs on public web server
1 day 4 hours ago
1 day 8 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi
It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?