Google vs. AllTheWeb
There used to be a debate about which search engine was best. And maybe there still is, but we haven't been hearing much about it because Google is pretty much it. Even Yahoo uses Google. The situation is typified by these remarks posted by Jason Kottke the other day at Kottke.org: "Google has been down for most of the day (for me, at least), so I had to use, ugh, Altavista to search for something earlier. It's the first time I'd used something other than Google in more than a year, and it took me about 3 times as long as normal to find what I was looking for. Google is useful enough that I would pay a $5-8 subscription fee per month for access to it. Google is the default command-line interface to the Web...and well worth paying for."
Now there's a pull-quote for you: "default command-line interface for the Web". And maybe that's what we should expect from a well-funded runaway hack by Linux weenies (who nonetheless have a policy of patenting their software).
When you're the default de facto portal for searching everything on the Web, you don't need to do a lot of PR. So Google doesn't. But they're certainly glad to share info when they're asked, which is what happened when I asked Google's VP Corporate Communications, Cindy McCaffrey, to share a few up-to-date facts about the company. Here's some of what she gave me:
Data centers: 4
Linux computers: >10,000
Searches per day: >150 million
Index of Web pages: >1.6 billion
Image base: >330 million
Usenet messages: >650 million (going back >5yrs)
Language subsets in the index: 28
International domain sites: 23
PDFs: >22 million
Included in searches by file type: wk1,wk2, wk3, wk4, wk5, wki, wks, wku, mw, xls, ppt, doc, wks, wps, wdb, wr, irtf, ans, txt
They also have maps, phone directories, dictionary definitions, Web page translation... the list just keeps growing.
Fast Search and Transfer ASA is a Norwegian company with offices in the US and elsewhere. Their original and persistent goal has been to build the world's largest and deepest search engine. Early on they partnered with Dell and Lycos, which ultimately employed FAST engines for searching the Web, images, multimedia and everything else.
And now FAST has rebranded its site as "AllTheWeb", with the tagline "all the web. all the time". And they're doing some aggressive PR. Normally I resist that kind of thing, but I've been warming to these Norwegian guys ever since I started hearing from them, mostly because they felt that they should be no less legit to the community than Google. Their engines run on FreeBSD and were developed on FreeBSD and Linux machines. In fact, FAST's first engine, FTPsearch, was developed under the GPL. You can still download the GPL version of that software at ftp://ftpsearch.ntnu.no/pub/ftpsearch/. Search results are also presented by Apache and PHP.
I was also told that some of the same folks were involved in PHP's development for a long time, and that many of FAST's R&D people in Norway come from one UNIX-oriented computer club at the university in Trodheim. It's called "Programvareverkstedet," or PVV.
Whether it's merit, PR or both, AllTheWeb.com is clearly getting some mojo going. A few days ago Kevin Elliot at About.com wrote, "for searches related to news and current events, it blows the conventional wisdom about Google right out of the water". There's more positive spin at SearchDay, Pandia, Research Buzz and the company's own press release list.
I just ran a quick test of the two services. Here's how they did, at least in terms of returning raw numbers:
"Geeks on the Half Shell":
That last one was a real test, because it referred to a real piece that's been up on both the old and the new LJ site since November 7.
So here's a PR lesson for the AllTheWeb folks. If you're going to send out press releases to editors bragging about how fast you crawl news sites, at least crawl the ones you're pitching.
That said, I've been an AllTheWeb user since it started, and I still use their image searches as much as I use Google's. If you're in heavy search mode, it's better to choose between them with AND logic, not OR.
Doc Searls is Senior Editor of Linux Journal.
Doc Searls is Senior Editor of Linux Journal
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- Stunnel Security for Oracle
- SourceClear Open
- SUSE LLC's SUSE Manager
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- My +1 Sword of Productivity
- Tech Tip: Really Simple HTTP Server with Python
- Managing Linux Using Puppet
- Non-Linux FOSS: Caffeine!
- Google's SwiftShader Released
- Doing for User Space What We Did for Kernel Space
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide