Making More Sense of the Web

by Doc Searls

Editors' Note: The following is the text of the June 9 and June 23 editions of Doc Searls' SuitWatch newsletter. Sign up to be a subscriber of this bi-weekly newsletter.

I think one reason the Net and the Web are so successful is they came without much of an organizing framework. The only directory was the domain name system (DNS), and it specified nothing that comes after the first single /. The rest was free to become as big a haystack as anybody wanted to make it.

Which we did, with the help of search engines. The haystack nature of the Web required search engines. In effect, Google and Yahoo say, "Right, the Web is a haystack, and we can help you find a needle in there." Now, using search engines is so much a part of life on the Web that most of us don't lament the lack of a directory structure to the place. Or the relative failures of Yahoo and DMOZ to create library-like directories of the Web's contents.

But what if some of the Web actually gets organized? What then?

There aren't but a few ways to organize things: categorically, alphabetically, numerically, chronologically, spatially, geographically...

We do have a crude sort of geographical organization with country codes in DNS: .uk, .cz, .jp and so on. Except in the US, where almost nobody other than bothers to use the .us code. But still, there hasn't been much to compromise the haystack nature of the Web.

Until two phenomena came together: blogs and syndication. Together they're creating a corner of the Web--call it the syndisphere--that is organized chronologically.

Blogs have a virtual directory path--http://[blogname]/year/month/day/date/post--where the last item has its own permalink. This is the directory nature if not structure comprehended by the new search engines and related services--Bloglines, Blogpulse, Feedster, IceRocket, Pubsub and Technorati--that look only at the part of the Web that's syndicated through RSS feeds. Services such as Technorati archive every post from every blog with an RSS feed. To them every permalink is actually permanent. (Disclosure: Technorati was born while its founder, David Sifry, and I worked together on a story about blogging for Linux Journal, and I'm on the company's advisory board.)

The same isn't true for Google or Yahoo. The indexes of those search engines are inventories of what's on the Web right now. The perspective isn't chronological or any similarly structured context. It's all haystack. Which is fine. They do a miraculous job. And on open-source infrastructure, no less.

But the emergence of a chronological corner of the Web is an interesting phenomenon, one that grew naturally, from the bottom up. No big company said This Will Be So. Instead, this new subsphere emerged on its own, naturally. Now, another interesting natural phenomenon is showing up: tagging.

As with RSS, tagging brings a new organizing principle to bear, at least on its own corner of the Web. Tags are labels individuals can apply to anything, through HTML. The first places tags appeared were and Flickr. The former is a social bookmarks manager, and the latter is a photo archiving service, though neither label does either service justice. What matters about both is their value comes primarily from user contributions. Just about everything you see on both services is what users put there.

And tags are a big part of it. At Flickr, you are asked to tag, essentially to label with membership in a user-definedcategory, every photo you put up there.

The practice has spread to blogging. Many blogs now add the rel="tag" element to their links or append "tags" or "Technorati tags" to their posts.

The rel="tag" spec is described at the Microformats wiki. The editor and author of the wiki is Tantek Çelik. Derek Powazek and Kevin Marks are credited under the Concept heading. All three work at Technorati.

I recently did an IM interview with Tantek, to deepen my own understanding of what tags are about:

Doc Searls: Technical question: Who started the whole tagging thing. Delicious? Flickr? both? I know T'rati was the first to search it. Then when/how did rel="tag" come along?

Tantek Çelik: Technorati invented rel="tag" and distributed/decentralized tagging.

DS: So tagging was internal to the Delicious and Flickr silos before that?

TÇ: Yes.

DS: What's the difference between a tag and a Technorati tag? The latter isn't proprietary except...that's how it sounds.

TÇ: We've been calling them "rel tags" for exactly that reason: to make it clear.

DS: rel means what, exactly?

TÇ: rel means the relationship between the current document (or large portion thereof) and the href that the hyperlink points to. The way it labels this relationship is in terms of a noun describing the resource at the href. The best illustrative example of this is rel="stylesheet" that's used to indicate that the href over there is a stylesheet for the current document. Another great example of this is rel="license", specified here: rel="license" means is rel="whatever" part of the w3c or whatever spec this href over here (e.g. a link to a CC or Apache or GPL license page) is a license for the current page.

DS: and rel is standard HTML?

TÇ: resl is a standard HTML4 attribute as defined by the W3C in the HTML4 specification, which *also* states that authors may use their own rel values and may define them using a profile.

DS: How does Technorati search Flickr and Delicious? Are "posts" there rss-fed?

TÇ: That's where XMDP (XHTML Meta Data Profiles) comes in. XMDP is a format for defining such profiles. See this explanation. Technorati shows tag results from Flickr and Delicious using their RSS feeds.

DS: So it's still in the framework, or practice, of RSS-activated search.

TÇ: Yes.

Hugh MacLeod, the marketing iconoclast whose Gapingvoid cartoons I once described as "Dilbert for people whose jobs don't suck", has lately taken an interest in tags as well. He and his friend Sig have invented a "tree-structure-free gizmo" called Thingamy, which they say is "basically a different approach to organising data, finding data, and transferring knowledge". Valuing free association and imprecision, they call it an "anataxonomy".

Of course, it's positioned as an alternative to tree-like, or any kind of, structures. But, as Valdis Krebs, a guru of data visualization, said in comments to one of Hugh's posts, "It is not an OR problem... hierarchy OR something else -- like network. It is an AND situation... hierarchy AND network -- prescribed AND emergent...".

I think it's categorical. That makes it a tree of a very short sort, perhaps the height of moss.

What matters most is who is coming up with it. As usual with cool things that happen naturally on the Web, it's not the big vendors or other usual suspects. It's individuals, trying to make sense of the world.

Dollars, of course, will come later.

Doc Searls is Senior Editor of Linux Journal. He also presides over Doc Searls' IT Garage, which is published by SSC, the publisher of Linux Journal.

Load Disqus comments

Firstwave Cloud