Still Searching

I want a search engine to grep the world for me.

Water, water, every where, Nor any drop to drink. --Samuel Taylor Coleridge

If information were water and search engines were taps, I'd chug about ten gallons a day. I don't just want it. I need it. Name a verb expression for appetite, put it between I and information, and you start to see where I'm coming from.

Most of the information I crave is specific and textual; and since most specific text information involves more than one word in a row, I'm usually looking for alphanumeric strings. Sometimes, but not always, those strings are words.

In other words, I want a search engine to grep the world for me.

Since I'm sure this is no-hope-in-hell territory, I just went looking for a quote to express my frustration. I found it through Google. It's from the prolific Dmitry Kirsanov, writing in the “Advanced” chapters of HTML Unleashed, Professional Reference Edition, the full text of which is exposed online at http://www.webreference.com/dlab/books/html-pre/. He writes, “Those accustomed to grep-style regular expressions can't even dream of using something similar with search engines.”

We won't go into why. But we do have to ask if it's necessary for so many engines to be as bad as they are about this. It seems like every time I find a search engine that does The Job, somebody buys it and it goes to hell. Or it goes to hell and then somebody buys it. Whatever order, the bad news is always around the corner. And, without fail, it comes in the form of (here comes that dreaded word)...marketing.

Marketing can't seem to help trading utility for “reach”, ''exposure'', “targeting” or some other ''strategic'' abstraction intended to influence the largest possible population—ignoring the fact that today's common denominators aren't as low and wide as they used to be. Especially on the web.

The uncommon denominators (that's us again) are rather abundant, too. And it's not like marketing lives to ignore the connoisseur. Witness wars among automobile makers over handling characteristics only mechanics and race car drivers can fully understand. (How many of us careen our Chevy Tahoes down black diamond slopes or floor our Acuras to tire-melting speeds on 2-lane Nevada backroads?)

But dot-com marketing is a wacky breed that often lusts after consumers to a degree not seen outside Procter & Gamble, circa 1958. It was consumer-hungry marketing that killed Lycos (the original one that came from Carnegie Melon), then Infoseek, then Hotbot (as Inktomi created it), then Altavista (as DEC created it). Each of those was born to serve intellectual curiosity. Others—Looksmart, AskJeeves, Go, Yahoo! and DirectHit, to name a few—were born good for little other than flat-ass-dumb “consumer” searches for “favorites”, “portals” and whatnot.

Okay, I'll make an exception for Yahoo!, which has always used human beings to catalog the Web. And now it hired Google to do the heavy lifting, which is a good thing. We'll say more about Google after the Altavista autopsy.

I knew Altavista was terminal last fall, when its “advanced” search page was suddenly replaced by bragging about “improvements”. “Tips” were gone. So was a nice and easy way to search for inbound links to a given URL. If the function persisted, no clues to procedures remained (at least that I could find) amidst the fresh marketing poop.

Of course, there was a survey. So I filled it out. This came back by e-mail:

Thank you very much for recently filling out the survey on the AltaVista Advanced Search page. Your suggestions and comments will help us continue to make AltaVista Advanced Search the best way to location [sic] information on the Internet.

We've found that once users try Advanced Search, they realize the power of the tools that AltaVista has to make your searching more precise. So, we're always trying to come up with new ways to encourage regular searchers to try out Advanced Search. And, that's where we'd like your opinions.

As you'll recall, along with your survey responses you also submitted this email address. So if we had any follow-up questions we could contact you again. Well now we'd like your opinions about some promotions we'd like to use to convince users to give Advanced Search a try.

The six-question survey should take only a couple of minutes. Simply click on the hyperlink below. If you are using AOL or another email service that does not support hyperlinks, please copy the hyperlink and paste it into your browser window in the Address box.

The URL delivered me to a silly world whose gods believe that prizes might do what features won't. I wrote this in the Input box:

I don't want a prize. If your search is so damned “advanced” (and how can it be, now that you fail to even mention the wonderful “link:www.mysite.com -url:mysite.com” feature that only works in BASIC, fercrissake!), I'd be glad to help out for FREE. I'll give you my time if you'll give me your attention. I want truly advanced search functions. THAT's what tempts me. Not prizes.

At the bottom of page two, the system broke down and wouldn't advance me to page three. I gave up and didn't go back except to see how it compared with competition.

But hey, maybe they listened. The “cheat sheet” at doc.altavista.com/adv_search/syntax.shtml restores a lot of the good advice lost from the original Advanced page. But the sad fact is that Altavista isn't as good as it used to be. FAST, Google and even MSN Search yield better results. If you're looking for strings. (For more, see the feature in UpFront.)

I know because I test search engines pretty often. I go deep into a document somewhere in my own domain, http://www.searls.com/, and grab a string of text that appears both in my own site and in a number of others, such as a quote from literature. Then, I run a bunch of engines through the mill. That's how I knew when Altavista started to beat Infoseek, when Hotbot began to beat Altavista and when FAST began to beat Hotbot.

The last time I saved results was December 3, 1999. At that time, FAST, http://www.alltheweb.com/, won. They still win (see UpFront) on some tests but lose on others—usually to Google.

There's an awful lot to like about Google. First, they run their engine on Linux (in case you're asking, Google is a huge Red Hat customer). Second, their user interface is blessedly simple and devoid of hype for anything other than itself, and there's precious little of that. Design-wise, they make great use of white space. Third, they have nicely incorporated the DMOZ Open Directory project's catalog, which is essentially The People's Yahoo!. But fourth (and most importantly), they have done more than any other search company to allow trusting searches of both strings and collections of words at the same time.

Where FAST and Hotbot give you a pop-out menu choice of “the phrase”, “all the words” and “any of the words” (or the equivalents), Google does all of those at once, defaulting first to phrase mode. You can use quotes to narrow the results, but the difference is usually small. That means Google has done a good job of providing the largest possible narrow search. In fact, the one site users want comes out on top so often that Google confidently provides an “I'm Feeling Lucky” button that yields just one result.

Google does have special search functions, all well-explained. I wish they had more, but I can live without them. Google is, for me, by far the most useful and reliable search engine—at least for those hard-to-find word strings.

What's not to like about Google? Well, there's the Patent Issue. They're going after patents on their search methods (and who knows what else), which costs them good will in the Linux/Open Source community. At an event early this year, I talked briefly about patents with Google's co-founder, Larry Page. It was clear that Larry isn't crazy about them. Immediately afterwards, I talked with John Doerr. It was equally clear that John is crazy about patents—to the degree that he believes that patents are one of the things that “make America great.” John, of course, is a VC with Kleiner Perkins, which conspicuously funds Google.

There's also the risk that Google will pursue an advertising-driven business strategy. In current searches, ads show up as annotated, text-only links, posted above search results. These are certainly far less onerous than banners (also harder to block). But they quietly crept in a few months ago. What's the next step?

I don't know. In fact, I just tried to force Google's engine to give me an ad, and it wouldn't do it—not once in ten tries. So I suspect that the company is being cautious. As it should. Far as I know, Google is the only search engine created by and for people who live to search and search to live.

We need more of those. And we need the ones we've got to remember why we like them so much.

Doc Searls (doc@ssc.com) is Senior Editor of Linux Journal and co-author of The Cluetrain Manifesto.

______________________

Doc Searls is Senior Editor of Linux Journal

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix