Quantcast
Username/Email:  Password: 

Searching the World Live Web

What impact will Google's new blogsearch engine have for the Live Web?


Editor's Note: The following is the text of the September 15 edition of
Doc Searls' SuitWatch newsletter.
Sign up
to be a subscriber of this bi-weekly newsletter.

September 15--Live Web search got a lot bigger yesterday, when Google launched its
new blogsearch engine. There's no direct link on the Google index
page yet. For now, you can find it in the roster of services behind
the "more" link. There are 29 of those, and Blog Search is the newest.

But the news is still big. It legitimizes the Live Web--and
blogging in particular--in a big way.

Far as I know, the blog search category was born when David Sifry put
a hack he called Technorati on a Penguin Computing Linux box that lived
in his basement while he and I were working on
"Building With
Blogs"
,
a feature for the February 2003 issue of Linux
Journal
. Dave needed to research blogs, so he created a tool
for it. As of today, Technorati's traffic is #751 on Alexa, pushing 80
million page views per day. (Disclosure: I'm on Technorati's Advisory Board.)

Other Live Web search pioneers include Bloglines, Blogpulse,
Feedster, IceRocket and PubSub. The results they yield are radically
different from what you get with Wide Web searches, as well as from each
other. Mostly, the results are newer. They're also more likely to come
from individuals and live news services than from companies with
static sites.

Let's say you want to search for Katrina and Interdictor. The latter
is the Weblog of Michael Barnett, who helped keep DirectNIC's data
center up and running through Hurricane Katrina and the crisis that has
followed. Far more than a simple blog, Interdictor also has served as
a message board, a tech support line and a zero-bullshit news service.

You'll get results on Google's and Yahoo's main pages, but they
won't be especially current. I am writing this on September 14, and the top
result on Google is from September 3. Nor can you plumb them through
the dimension of time, staring at now.

Do the same search on Blogpulse, and you get results listed backward
in time, with the latest at the top. You also can watch trend results
for the same search. You can refine results by incrementally adding
search terms. And you can track conversations from one URL's "seed".
Search for the same thing on Feedster, and you get results listed
either by relevance or date.

Do the same search on Icerocket, and you get results grouped by
date, starting with today. You can refine your search to today, past
week, past month or by date range. You also can follow trends here,
and look back on your search history.

Do the same search on Technorati, and you get results from two
hours to two days old, with the most recent at the top. The company
tries to index everything within minutes. You also can find 5,680
posts tagged "katrina" and 5 tagged "interdictor".

Do the same search with Google's Blog Search, and you get 2,355
results. Although the overall look is similar to its Wide Web results,
here you get the option of sorting by relevance or date. You also can go back
100 pages through the first 1,000 results and subscribe to feeds of
the search as well. And, as you'd expect, it's much faster than all
the others.

You can't search through PubSub, but you can subscribe to keywords
and combinations of keywords. These searches are syndicated, so you
can receive them in your own aggregator. In fact, most of the live
Web engines provide feeds for searches of keywords, URLs or
combinations of both.

Of the Wide Web engines, only A9 also competes in Live Web searches,
using IceRocket.

All of them run on Linux, by the way. No news there, but worth
reporting, of course.

So, what's the difference between the Wide Web and the Live Web? Glad
you asked.

The simple difference is the Live Web is syndicated. That means
every time something is posted or updated, a notification goes out,
informing the world about it. The most familiar syndication method is
RSS, which commonly stands for Really Simple Syndication. There are a
number of different syndication formats--Google's Blogger uses Atom--but
as a class we tend to call them all RSS. Those familiar little
orange XML buttons are the common symbol for Live Web search feeds.

Wide Web search engines send out spiders to crawl through every site
on the Web. On Google, that's about 8.2 billion pages.
Live Web search engines crawl only syndicated pages and only when
they're notified by a fresh feed from those pages. So, while
Technorati searches through 17.1 million sources, it only indexes
pages that send out fresh syndicated feeds.

Here's another way of looking at it: Wide Web indexing is proactive
and archives everything, while Live Web indexing is reactive and
archives only what's fresh.

Of course, a Live Web engine can archive much more than that, over a
long period of time. But what matters most usually is what's
freshest--or both relevant and fresh.

Another difference is in the rate of change in technologies,
standards and practices. This results in highly varied search
experiences that are bound to change over time. In the last few
months, "tagging" posts (or photos on Flickr, or bookmarks on
Del.icio.us) with categorical keywords has proven to be a handy way to
discover and peruse ad hoc groupings. Technorati has been providing
tag search along with tagging methods for several months now, and
others are bound to follow. Meanwhile, Wide Web searching has remained a
very consistent experience ever since Google taught users to trust
PageRank.

In the few hours that have passed since Google's Blog Search has
appeared, many posts in the Blogosphere have been predicting
the death of the incumbent Live Web engines. A few minutes ago I
spoke to Jason Goldman, who runs both Blogger and Blog Search at
Google. Rather than predicting the death of competitors in the Live
Web space, he said he expected it to become energized and to grow.
He also appreciated
David
Sifry's blog post
, welcoming Google to the space.

Somewhere in there, a friend sent me a message reminding me that
Apple also publicly welcomed IBM to personal computing in 1982, when
the IBM PC was introduced. The implication was that IBM flattened
Apple. In a way, that happened. But it's worth noting also that Apple
is still very much around, healthy and a leader in its industry--and
some others too. Jason also reminded me that there were many
predictions of death for competitors when Google bought Blogger.
Instead, the blog creation tool business only got bigger.

Every industry needs its mainstays and its pioneers. The Live Web has
both now. And it will be better for everybody if they all do what
they do best.

Doc Searls is Senior Editor of Linux Journal, for
which he writes the Linux for Suits column. He also presides over
Doc Searls' IT Garage,
which is published by SSC, the publisher of Linux
Journal
.

______________________

Doc Searls is Senior Editor of Linux Journal

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Newsletter

felix's picture

I just signed up for the newsletter, I am so excited. Thank You and I look forward to finally getting to read something exciting.

thx for the info

ajondo's picture

i did not know about Icerocket!
very helpful ...

" Web search engines send out

Qun Cao's picture

" Web search engines send out spiders to crawl through every site on the Web. On Google, that's about 8.2 billion pages. Live Web search engines crawl only syndicated pages and only when they're notified by a fresh feed from those pages. So, while Technorati searches through 17.1 million sources, it only indexes pages that send out fresh syndicated feeds.

there's another way of looking at it: Wide Web indexing is proactive and archives everything, while Live Web indexing is reactive and archives only what's fresh."

This is quite interesting, but can somebody enlight me on how feeds inform the search enginees about their fresh entries? It seems to me that Live Web search enginees still need to go out to check the XML source of all the feeds and determine if they have been updated since the last visit. Some feeds might register with Technorati for "ping back", but that's probably a very small portion of the Live Web.

A few corrections: Daypop

Anonymous's picture

A few corrections:

Daypop was the first blog search engine, around since 2001, well befor Technorati.

A9's opensearch interoperates with a whole bunch of Live Web search engines. A quick check shows Feedster, Blogwise, Bulkfeeds, Blogdigger, blogdb.jp, blogWatcher and Findory.

Live Web search engines dont only crawl RSS feeds. Some do, some don't. Technorati does not only crawl RSS feeds.

Many Live Web search engies archive everything. They rank by freshness.

Post new comment

  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <pre> <ul> <ol> <li> <dl> <dt> <dd> <i> <b>
  • Lines and paragraphs break automatically.
  • Use to create page breaks.

More information about formatting options