Building with Blogs

The category is hot and huge but still new. Here's a look at your choices.

Name a topic with a community of interest around it. Now go to Google and look it up. There's a good chance one or more of the top results will include somebody's weblog (aka blog). Let's take three examples:

  • 802.11b: the top listing out of 687,000 results is WiFi news, a weblog by a pair of journalists, Glenn Fleishman and Adam Engst. The IEEE working group's site is in the #2 position. The WiFi Alliance's portal is #3.

  • Segway: the top listing out of 81,500 results is the Segway company's site, followed by the Berkeley Segway Portal and then followed by Segway News—a weblog by Paul Nakada.

  • Weblog: out of 2,620,000 results, the top listing is Aaron Swartz's Google Weblog, followed by the Guardian Limited's weblog. Following them is the blog of a certain Linux Journal editor who also happens to be writing what you're reading right now.

Blogs succeed largely because they are extremely native to the Web as Tim Berners-Lee conceived it in the first place. Here's how weblog software pioneer Dave Winer explains it:

The first weblog was the first web site,, the site built by Tim Berners-Lee at CERN. From this page TBL pointed to all the new sites as they came on-line. Luckily, the content of this site has been archived at the World Wide Web Consortium. (Thanks to Karl Dubost for the link.)

Linking to sources and crediting them (as Dave does in that last line) has always been a native ethical and journalistic practice on the Web. While big-time broadcasters, publishers and VC-funded wannabes continue to see the Net as nothing more than a plumbing system for distributing “content” to “consumers”, blogging software developers have quietly added enormous value to the linking and crediting functionality of the Web. Dave and other independent developers have created standards like XML-RPC, SOAP and RSS, plus open APIs that together turn the Web into a writing and publishing medium like nothing we've ever seen before.

Blogging is not about “architecting”, “building”, “designing” or “authoring” anything, because blogs aren't “sites” in the usual sense. Blogs are journals. With blogs you write directly on the Web. Most posts are short, though they don't have to be. All are topical and current, or they disappear from the aggregation sites and services and eventually from the “blogrolls” of listed favorite links on other blogs.

Each blog is like a fireplace, and each post is like a log heaved on top to keep the fire burning. Every post has its own “permalink”, so others can point directly to it. As long as a blog puts out heat and light, others who care about the author's subject are drawn to it. So are Google and other search engines, which sift constantly through the ashes.

At their best, blogs are link magnets as well as sources of links, which is why Google likes them so much. Google equates inbound links with authority and ranks the results accordingly. More links from more highly linked-to pages result in higher page ranks. That's why so many blogs rise to the top of so many subject searches. The whole system—which includes blogs, aggregators, web services and Google itself—feeds, builds and grows on itself. It also attracts and feeds on the RSS streams offered up by discussion and news sites ranging from Slashdot and Linux Journal to the New York Times. RSS is one among a growing number of free and open technologies created and/or improved by weblog developers.

As weblogs account for more and more of the traffic in knowledge about a given subject, they become powerful instruments for hacking common wisdom. In many categories, they are moving ahead of mainstream journals and portals and building useful community services where over-funded dot-com efforts failed spectacularly. One example from that last category is John Hiler's Cityblogs, which appeared in December 2002. Hiler explains:

Local sites are caught between a rock and a hard place: either they hire expensive full-time writers to create content they can't afford or they fire their writers and turn to automated content: weather reports, local news and movie listings.

It's a Gordian knot—hire expensive writers or you have no content—that blogs are uniquely positioned to cut in two. Now that I've set up the site, I can cover these three categories in an hour or two a day. As Glenn “Instapundit” Reynolds put it at a recent conference on weblogs, “blogging is cheap”.

So blogging is becoming an option in all kinds of places, which means there's a good chance that you, as a Linux Journal reader, fall into one or both of two groups:

  1. Users who want to set up a blog and start writing on the Web.

  2. System administrators and others with scripting and programming skills, who are either looking to set up a blogging system or to manage or change a system that's already in place.

To get a handle on both, let's go back to Google.

For better or worse, Google is a commercial company whose services have become de facto web infrastructure. This is especially true for blogs, which make liberal use not only of Google's search engine but also of its APIs, which allow automated queries from programs.

Google's APIs are part of a growing raft of sites and services, mostly hacked together by enterprising independent developers. Technorati (see Sidebar), for example, pays attention to fresh links between blogs (leapfrogging referrer logs) and organizes the information into “watchlists” and other useful listings. If you want Technorati to tell you who's currently linking to your blog, you can ask for this information on the site or pay $5 per year for a watchlist sent out each day by e-mail.

The Technorati Story: How a New Web-Services Product Review Grew Out of a Research Assignment

Technorati is the creation of this article's coauthor, David Sifry, who is a cofounder of Linuxcare and Sputnik. It's a good example of a LAMP program—one based on the de facto platform of Linux, Apache, MySQL and Python, PHP or Perl. Another LAMP creation is Phillip Pearson's Blogging Ecosystem, which keeps two Top 300 lists: one for the most-linked-to blogs and one for the blogs that do the most linking.

Both Technorati and the Blogging Ecosystem are made possible in large part by RSS, the XML dialect whose acronym means really simple syndication. Thanks to RSS, every story you read on the Linux Journal web site is syndicated automatically to anyone who wants to read and point to it or aggregate it with other sources.

The Blogger API is another enabling infrastructure hack. It's used not only by Blogger (the most popular weblog system, in terms of sign-ups) but by Radio Userland and Movable Type, the other two leading weblog systems. This API is what made it possible for one of your authors to hack methods for posting to various breeds of blogs through e-mail and Jabber. There are many other hacks just as there are many other blogging systems. SourceForge alone lists dozens of weblog systems in various states of completion. In the LAMP vein, Geeklog and CafeLog are PHP-based and use MySQL, as does Drupal. In fact, PHP-Nuke, PostNuke, Drupal and Slashcode all are flexible enough to serve as weblog systems. So is Rusty Foster's Scoop, which is written in Perl, as is Movable Type. Roller is written in Java for J2EE environments; in fact, there is a whole community of Java bloggers. The list goes on, and it's a long one. So let's break the list down a bit into four family trees.


Doc Searls is the Editor in Chief of Linux Journal