“There is none. Get over it.” —Scott McNealy on privacy
“At this stage in my life, the thing that really turns me on is competence.”
“Science would be superfluous if the outward appearance and the essence of things directly coincided.”
“Certum est, quia impossibile. (It is certain, because it is impossible.)”
“What the hell is content? Nobody buys content. Real people pay money for music because it means something to them. Being a “content provider” is prostitution work that devalues our art and doesn't satisfy our spirits. Artistic expression has to be provocative. The problem with artists and the Internet: Once their art is reduced to content, they may never have the opportunity to retrieve their souls.''
“Today I want to talk about piracy and music. What is piracy? Piracy is the act of stealing an artist's work without any intention of paying for it. I'm not talking about Napster-type software. I'm talking about major label recording contracts.”
“Linux means never having to delete your love mail.”
“Maturity is when you quit blaming other people for your problems.”
“Who is General Failure, and why is he reading my hard drive?”
—Bob (on Slashdot)
“I'm still freaked out by all this. I just wrote a (bleeping) anthropology paper.”
“My hovercraft is full of eels.”
These days, search engines are praised for their educated guesswork. Each pile of results is presented with an implication: “Here's what we think you want.”
But most serious researchers (i.e., Yours Truly and all Linux Journal readers) often don't want an engine to guess. These users want to search for specific strings—or, as some search engines put it, phrases. These include names, text passages, lines of code, diseases and all other series of words.
On December 3, 1999 and June 27, 2000, I tested fifteen of the most familiar search engines by searching for a relatively unique phrase: “He that by me spreads a wider breast than my own”, which is part of a familiar line from Walt Whitman's Song of Myself. The phrase occurs far from the beginning of the work and can be found in many documents on the Web, including one on my own site, http://www.searls.com/.
It's a tough test. Only those engines that deeply search an enormous range of sites will yield results. The word “breast” is also a common and controversial word, which might invite spurious results.
Testing for strings also isn't easy on the tester, since search engines have different ways of recognizing phrases. Most require quotes. Others (Hotbot, FAST) have pop-out menu commands. One (Yahoo!) requires clicking a radio button in “advanced” search mode. Others (mostly at the bottom in the surveys) don't search phrases at all.
I run this test quite often for my own purposes and rarely record the results. But last December I did record them, and I repeated the test again on June 27—nearly seven months later. This time I added two more tests: one for an obscure blood disorder and the other for a Linux system call. Here are the results:
As you see, FAST won the first search, by a wide margin, just as it did in December. But Google (see Jason Schumaker's “Interview with Sergey Brin”, page ??) is the clear winner of the second and third searches—and didn't do too bad on the first, either.
Near as I could tell, none of the engines fell for the “breast” bait—at least not in phrase search mode. On that one, they all get a passing grade.
Some of the results, however, were amazing. Go.com found “29,024,074 matches” in the first search, but nothing I wanted. The first ten results all related to breastfeeding or breast cancer. No porn, of course. But I did get a banner ad featuring a happy-looking woman in a cleavage bra. “BREAST AUGMENTATION?” it asked. “Looking for breast augmentation from a doctor in your area? CLICK HERE.” Nice guess, guys.
As we can see, search is consolidating as a business category. By the time you read this, Yahoo! will be using Google (in a deal struck one day before this survey). Other partnerships and cross-investments are sure to follow (Lycos recently bought a 15% stake in FAST, for example). My guess is that there will be fewer search engines by the time you read this.
I just hope there aren't fewer good ones.
Doc Searls (firstname.lastname@example.org) is Senior Editor of Linux Journal and co-author of The Cluetrain Manifesto.
- High-Availability Storage with HA-LVM
- DNSMasq, the Pint-Sized Super Dæmon!
- Localhost DNS Cache
- Real-Time Rogue Wireless Access Point Detection with the Raspberry Pi
- Days Between Dates: the Counting
- You're the Boss with UBOS
- The Usability of GNOME
- Linux for Astronomers
- Multitenant Sites
- PostgreSQL, the NoSQL Database