Work the Shell - Simple Scripts to Sophisticated HTML Forms

 in
Building on the Yahoo Movies search form script.

Last month, we looked at how to convert an HTML form on a page into a shell script with command flags and variables that let you have access to all the features of the search box. We tapped into Yahoo Movies and are building a script that offers up the key capabilities on the search form at movies.yahoo.com/mv/advsearch.

The script we built ended up with this usage statement:

USAGE: findmovie -g genre -k keywords -nrst title

So, that gives you an idea of what we're trying to do. Last month, we stopped with a script that offered the capabilities above and could open a Web browser with the result of the search using the open command.

Now, let's start with a caveat: open is a Mac OS X command-line script that lets you launch a GUI app. Just about every other Linux/UNIX flavor has a similar feature, including if you're running the X Window System. In fact, with most of them, it's even easier. A typical Linux version of “open a Web browser with this URL loaded” might be as simple as:


firefox http://www.linuxjournal.com/ &

That's easily done, even in a shell script.

Actually, if you're going to end a script by invoking a specific command, the best way to do it is to “exec” the command, which basically replaces the script with the app you've specified, so it's not still running and doesn't even need to exit. So in that case, it might look like exec firefox "$url" as the last line of the script.

This month, I want to go back and make our script do more interesting things. For now, an invocation like:

./findmovie.sh -g act evil

produces a command from the last few lines in the script:


echo $baseurl${params}\&p=$pattern
exec open -a safari "$baseurl${params}\&p=$pattern"

that ends up pushing out this:

http://movies.yahoo.com/mv/
↪search?yr=all&syn_match=all&adv=y&type=feature&gen=act&p=evil

It's pretty sophisticated!

Letting the User Dump the Resultant Data

What if the user wants the option of dumping the data to the command line instead of launching a browser? We can address that by adding a -d dump command flag into the getopt block:


while getopts "dg:k:nrst" arg
do
  case "$arg" in
    d ) dump=1 ;;
    g ) params="${params:+$params&}gen=$OPTARG" ;;

To dump the data, we'll enlist the powerful curl command, as we've done in the past. The program has zillions of options, but as we're just interested in the raw output, we can ignore them all (fortunately) except for --silent, which hides status updates, leaving the conditional:


if [ $dump -eq 1 ] ; then
  exec /usr/bin/curl --silent "$baseurl${params}\&p=$pattern"
else
  exec open -a safari "$baseurl${params}\&p=$pattern"
fi

But, that generates a huge amount of data, including all the HTML needed to produce the page in question. Let's spend just a minute looking closely at that output and see if there's a way to trim things at least a bit.

It turns out that every movie title that's matched includes a link to the movie's information on the Yahoo Movies site. Those look like:


<a href="http://movies.yahoo.com/movie/1809697875/info">Resident Evil

So, that's easy to detect. Better, we can use a regex expression with grep and skip a lot of superfluous data too:

cmd | grep '/movie/.*info'

That comes close to having only the lines that match individual movies, but to take this one step further, let's remove the false matches for dvdinfo, because we're not interested in the links to DVD release info. That's a grep -v:

cmd | grep '/movie/.*info' | grep -v dvdinfo

Now, let's have a quick peek at comedies that have the word “funny” in their titles:

./findmovie.sh -d -g com funny | grep '/movie/.*info' 
 ↪| grep -v dvdinfo |  head -3

<td><a href="http://movies.yahoo.com/movie/1810041785/info">
<b>Funny</b> People (2009)</a><br>

<td><a href="http://movies.yahoo.com/movie/1809406735/info">What's So 
 <b>Funny</b> About Me? (1997)</a><br>

<td><a href="http://movies.yahoo.com/movie/1808565885/info">That 
 <b>Funny</b> Feeling (1965)</a><br>

Okay, so the first three films in that jumble of HTML are Funny People, What's So Funny About Me? and That Funny Feeling.

From this point, you definitely can poke around and write some better filters to extract the specific information you want. The wrinkle? Like most other sites, Yahoo Movies chops the results into multiple pages, so what you'd really want to do is identify how many pages of results there are going to be and then grab the results from each, one by one. It's tedious, but doable.

______________________

Dave Taylor has been hacking shell scripts for over thirty years. Really. He's the author of the popular "Wicked Cool Shell Scripts" and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix