Semantic Web Publishing with RDFa
I learned UNIX from a real old-style guru named Jimmy who memorized microchip numbers and used sed as a word processor. Wanting to do well on my first job, I proudly showed him how I was putting detailed comments in my code. My mentor was not impressed. “Why are you doing that?” he shot at me, going on to explain that neither comments nor docs could be trusted. If you wanted to understand what the code was doing, you better read the code. Software project managers might not agree, but Jimmy did have a point. Docs and comments can become out of date or inaccurate, but the code can't. Broken, yes. Inaccurate, no.
A similar issue arises when writing a Web page that is intended to be read by humans and parsed by machines. New sophisticated search engines on the horizon will be hungry for semantic content—that is, for data that can be machine-parsed for meaning. Often the format will be some form of RDF, or Resource Description Framework. If you are publishing Web pages in order to share your data with the world, it follows that you want to make it available to both humans and search engines. Generating two sets of files, one human-readable HTML and another machine-parsable RDF, means that you give up the ability to hand-edit your HTML files to make corrections and sets up your site for likely inconsistency down the road—not to mention that full-on RDF/XML is verbose and ugly.
Enter RDFa, a lightweight relatively new mechanism for embedding structured data into HTML in a simple but fully standards-compliant way. I run a Web site that is generated from templates. To understand how RDFa might fit in to my site, I started with a simple manually created example: an event schedule for the local rodeo. Later in this article, I also briefly cover some of the emerging tools that automate RDF and RDFa and describe how one company has created a large-scale RDF implementation to solve enterprise problems. Now, here's the example.
My original sample code looked like this, in vanilla HTML:
<div> <h1>Saturday Rodeo Schedule 2/22/08</h1> <div> 2:00PM : Bull Riding </div> </div>
It's pretty straightforward and clear to the human reader of the Web page or even someone editing the source, but it's meaningless to a search engine. To make this event clear to an RDFa-parsing engine, my first step was to pick a vocabulary that has well-defined terms for events. Luckily, there is just such a vocabulary, based on the iCalendar standard for calendar data. The vocabulary or vocabularies used in a document are specified right in the <html> tag at the start of the document:
<html xmlns="http://www.w3.org/1999/xhtml" >
The xmlns stands for XML NameSpace, and cal is the shorthand name we'll use to refer to this namespace further down. The http://www.w3.org/2002/12/cal/ical# is the URL to the RDF vocabulary file, and http://www.w3.org/1999/xhtml is the URL for the standard XML namespace that you might already be including in your documents. I explain further on discovering those and deciding which to use in a bit. Applying a bit of RDFa using basic iCal properties, we have this:
<div id=RodeoSchedule2008> <h1>Saturday Rodeo Schedule 2/22/08</h1> <div rel="cal:Vevent"> <span property="cal:dtstart" content="20080222T1400-0700">2:00PM</span> : <span property="cal:summary">Bull Riding</span> </div> </div>
From the browser's point of view, the HTML layout is unchanged. If desired, class= properties could be added for CSS formatting and would not impact the RDFa logical structure. This is different from the microformat hCalendar (another popular way of representing calendar data in HTML), in which fixed class names are assigned.
One last step alerts parsers to the presence of RDFa in our document and also specifies the encoding or character set used. We add the following lines at the very beginning of the file, before the <html..> tag:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
Now, any application that understands RDFa can scan your Web page and learn that there is an event called Bull Riding occurring on February 22, 2008 at 2:00 PM PST. In fact, you can verify that you've communicated correctly with the RDFa world by using any of a number of validating/parsing services. Using the Python-based service at www.w3.org/2007/08/pyRdfa, called RDFa Distiller, we can see that the above snippet produces the following semantic data, in what is called the N3 format:
@prefix cal:<http://www.w3.org/2002/12/cal/ical#>. @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>. [ a cal:Vevent; cal:dtstart "20080222T1400-0700"; cal:summary "Bull Riding"].
N3 is a shorthand that people who work heavily in the RDF world like to use for writing and representing the triples that compose the Semantic Web.
Fast/Flexible Linux OS Recovery
On Demand Now
In this live one-hour webinar, learn how to enhance your existing backup strategies for complete disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible full-system recovery solution for UNIX and Linux systems.
Join Linux Journal's Shawn Powers and David Huffman, President/CEO, Storix, Inc.
Free to Linux Journal readers.Register Now!
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Profiles and RC Files
- Astronomy for KDE
- Understanding Ceph and Its Place in the Market
- Maru OS Brings Debian to Your Phone
- Snappy Moves to New Platforms
- Git 2.9 Released
- What's Our Next Fight?
- OpenSwitch Finds a New Home
- The Giant Zero, Part 0.x
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide