A Web-Based Clipping Service

Let LWP turn your web client into a midnight marauder.
Using download-matching.pl

Now that we have a program to retrieve documents that fit our criteria, how can we use it? We could run it from the command line, but the point of this program is to do your work for you, downloading documents while you sleep or watch television.

The easiest way is to use cron, the Linux facility that allows us to run programs at regular intervals. Each user has his or her own crontab, a table that indicates when a program should be run. Each command is preceded by five columns that indicate the time and date on which a program should be run: the minute, hour, day of month, month and day of the week. These columns are normally filled with numbers, but an asterisk can be used to indicate a wild card.

The following entry in a crontab indicates the program /bin/foo should be run every Sunday at 4:05 A.M.:

5 4 * * 0 /bin/foo

Be sure to use a complete path name when using cron—here we indicated /bin/foo, rather than just “foo”.

The crontab is edited with the crontab program, using its -e option. This will open the editor defined in the EDITOR environment variable, which is vi by default. (Emacs users should consider setting this to emacsclient, which loads the file in an already running Emacs process.)

To download all of the files matching our phrases from Wired News every day at midnight, we could use the following:

0 0 * * 0 /usr/bin/download-matching.pl\
www.wired.com/news/http://www.wired.com/news/

This will start the process of downloading files from http://www.wirec.com/news/ at midnight, placing the results in $output_directory. We can specify multiple URLs as well, allowing us to retrieve news from more than one of our favorite news sources. When we wake up in the morning, new documents that interest us will be waiting for us to read, sitting in $output_directory.

Conclusion

Many organizations hire clipping services to find news that is of interest to them. With a bit of cleverness and heavy reliance on LWP, we can create our own personalized clipping service, downloading documents that reflect our personal interests. No longer must you look through a list of headlines in order to find relevant documents—let Perl and the Web do your work for you.

Resources

Reuven M. Lerner is an Internet and Web consultant living in Haifa, Israel. His book Core Perl will soon be published by Prentice-Hall. Reuven can be reached at reuven@lerner.co.il. The ATF home page is at http://www.lerner.co.il/atf/.

______________________

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState