You're trying to use a Perl script from
an old issue of Linux Journal, and it isn't
working out. Well, this is as good an opportunity as any to see how
things you don't touch still can break.Here's the original script,
which comes from an article in issue 40 titled "A Web Crawler in
Perl". Don't forget that code from old issues of Linux
Journal also is available from the
site. The listings from issue 40 are
in their own
directory, and you can save time by getting the code from
there, instead of removing the PHP-generated trimmings from the
HTML version.The script has two other problems. The first problem is the
script will exit silently if the first argument--$ARGV--is
anything other than a correctly formatted URL. For people used to
telling their browsers to go to linuxjournal.com instead of
http://linuxjournal.com/, this could be puzzling, especially if you
give the wrong number of arguments. If you do, the script will be
helpful and prompt you with a usage message, the second
problem.Offering help if the user makes one mistake and silently
exiting if the user makes a different mistake are not good ideas in
production code. But this is an example from the back of a
magazine, and nobody actually runs back-of-the-magazine code
without putting in a bunch of sanity checks that would be boring in
a magazine but booty-saving in real use. Right?Now for the interesting problem. Look at this line:print S "GET /$document
HTTP/1.0\n\n";We're sending a one-line HTTP 1.0 request. How well does this
work? It will do fine if there's only one web site on the server.
But today, many sites merge a lot of virtual hosts into one IP
address. As an example, let's try a one-line GET request like this
on the server
which is one virtual host on a site that hosts many:
$ telnet dmarti.livejournal.com 80 Trying 220.127.116.11... Connected to livejournal.com. Escape character is '^]'. GET /Connection closed by foreign host. dmarti@zingiber:~$ GET / HTTP/1.0
<HTML> <HEAD> <TITLE>Directory /</TITLE> <BASE HREF="file:/"> </HEAD> <BODY> <H1>Directory listing of /</H1>
And so on. Oops. Looks like we need a Host: header, which
came in officially with HTTP 1.1 but will work in HTTP 1.0
requests. Change that GET line above to:
print S "GET /$document HTTP/1.0\n"; print S "Host: $server_host\n\n";
And voilà! It works. The Host:
header tells the server from which virtual host to get the
page.The Perl script broke for some sites because people's
assumptions about the web changed, and the HTTP protocol was
updated to reflect that. Through no fault of your own, you had to
go back and change it. That's life on the Internet.Don Marti is editor in chief
of Linux Journal.
Fast/Flexible Linux OS Recovery
On Demand Now
In this live one-hour webinar, learn how to enhance your existing backup strategies for complete disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible full-system recovery solution for UNIX and Linux systems.
Join Linux Journal's Shawn Powers and David Huffman, President/CEO, Storix, Inc.
Free to Linux Journal readers.Register Now!
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Profiles and RC Files
- Astronomy for KDE
- Understanding Ceph and Its Place in the Market
- Maru OS Brings Debian to Your Phone
- Snappy Moves to New Platforms
- Git 2.9 Released
- OpenSwitch Finds a New Home
- What's Our Next Fight?
- The Giant Zero, Part 0.x
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide