You're trying to use a Perl script from
an old issue of Linux Journal, and it isn't
working out. Well, this is as good an opportunity as any to see how
things you don't touch still can break.Here's the original script,
which comes from an article in issue 40 titled "A Web Crawler in
Perl". Don't forget that code from old issues of Linux
Journal also is available from the
site. The listings from issue 40 are
in their own
directory, and you can save time by getting the code from
there, instead of removing the PHP-generated trimmings from the
HTML version.The script has two other problems. The first problem is the
script will exit silently if the first argument--$ARGV--is
anything other than a correctly formatted URL. For people used to
telling their browsers to go to linuxjournal.com instead of
http://linuxjournal.com/, this could be puzzling, especially if you
give the wrong number of arguments. If you do, the script will be
helpful and prompt you with a usage message, the second
problem.Offering help if the user makes one mistake and silently
exiting if the user makes a different mistake are not good ideas in
production code. But this is an example from the back of a
magazine, and nobody actually runs back-of-the-magazine code
without putting in a bunch of sanity checks that would be boring in
a magazine but booty-saving in real use. Right?Now for the interesting problem. Look at this line:print S "GET /$document
HTTP/1.0\n\n";We're sending a one-line HTTP 1.0 request. How well does this
work? It will do fine if there's only one web site on the server.
But today, many sites merge a lot of virtual hosts into one IP
address. As an example, let's try a one-line GET request like this
on the server
which is one virtual host on a site that hosts many:
$ telnet dmarti.livejournal.com 80 Trying 188.8.131.52... Connected to livejournal.com. Escape character is '^]'. GET /Connection closed by foreign host. dmarti@zingiber:~$ GET / HTTP/1.0
<HTML> <HEAD> <TITLE>Directory /</TITLE> <BASE HREF="file:/"> </HEAD> <BODY> <H1>Directory listing of /</H1>
And so on. Oops. Looks like we need a Host: header, which
came in officially with HTTP 1.1 but will work in HTTP 1.0
requests. Change that GET line above to:
print S "GET /$document HTTP/1.0\n"; print S "Host: $server_host\n\n";
And voilà! It works. The Host:
header tells the server from which virtual host to get the
page.The Perl script broke for some sites because people's
assumptions about the web changed, and the HTTP protocol was
updated to reflect that. Through no fault of your own, you had to
go back and change it. That's life on the Internet.Don Marti is editor in chief
of Linux Journal.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- SUSE LLC's SUSE Manager
- My +1 Sword of Productivity
- Managing Linux Using Puppet
- Tech Tip: Really Simple HTTP Server with Python
- Non-Linux FOSS: Caffeine!
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- Rogue Wave Software's Zend Server
- Doing for User Space What We Did for Kernel Space
- SuperTuxKart 0.9.2 Released
- Parsing an RSS News Feed with a Bash Script
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide