Work the Shell - Simple Scripts to Sophisticated HTML Forms

Building on the Yahoo Movies search form script.

Last month, we looked at how to convert an HTML form on a page into a shell script with command flags and variables that let you have access to all the features of the search box. We tapped into Yahoo Movies and are building a script that offers up the key capabilities on the search form at

The script we built ended up with this usage statement:

USAGE: findmovie -g genre -k keywords -nrst title

So, that gives you an idea of what we're trying to do. Last month, we stopped with a script that offered the capabilities above and could open a Web browser with the result of the search using the open command.

Now, let's start with a caveat: open is a Mac OS X command-line script that lets you launch a GUI app. Just about every other Linux/UNIX flavor has a similar feature, including if you're running the X Window System. In fact, with most of them, it's even easier. A typical Linux version of “open a Web browser with this URL loaded” might be as simple as:

firefox &

That's easily done, even in a shell script.

Actually, if you're going to end a script by invoking a specific command, the best way to do it is to “exec” the command, which basically replaces the script with the app you've specified, so it's not still running and doesn't even need to exit. So in that case, it might look like exec firefox "$url" as the last line of the script.

This month, I want to go back and make our script do more interesting things. For now, an invocation like:

./ -g act evil

produces a command from the last few lines in the script:

echo $baseurl${params}\&p=$pattern
exec open -a safari "$baseurl${params}\&p=$pattern"

that ends up pushing out this:

It's pretty sophisticated!

Letting the User Dump the Resultant Data

What if the user wants the option of dumping the data to the command line instead of launching a browser? We can address that by adding a -d dump command flag into the getopt block:

while getopts "dg:k:nrst" arg
  case "$arg" in
    d ) dump=1 ;;
    g ) params="${params:+$params&}gen=$OPTARG" ;;

To dump the data, we'll enlist the powerful curl command, as we've done in the past. The program has zillions of options, but as we're just interested in the raw output, we can ignore them all (fortunately) except for --silent, which hides status updates, leaving the conditional:

if [ $dump -eq 1 ] ; then
  exec /usr/bin/curl --silent "$baseurl${params}\&p=$pattern"
  exec open -a safari "$baseurl${params}\&p=$pattern"

But, that generates a huge amount of data, including all the HTML needed to produce the page in question. Let's spend just a minute looking closely at that output and see if there's a way to trim things at least a bit.

It turns out that every movie title that's matched includes a link to the movie's information on the Yahoo Movies site. Those look like:

<a href="">Resident Evil

So, that's easy to detect. Better, we can use a regex expression with grep and skip a lot of superfluous data too:

cmd | grep '/movie/.*info'

That comes close to having only the lines that match individual movies, but to take this one step further, let's remove the false matches for dvdinfo, because we're not interested in the links to DVD release info. That's a grep -v:

cmd | grep '/movie/.*info' | grep -v dvdinfo

Now, let's have a quick peek at comedies that have the word “funny” in their titles:

./ -d -g com funny | grep '/movie/.*info' 
 ↪| grep -v dvdinfo |  head -3

<td><a href="">
<b>Funny</b> People (2009)</a><br>

<td><a href="">What's So 
 <b>Funny</b> About Me? (1997)</a><br>

<td><a href="">That 
 <b>Funny</b> Feeling (1965)</a><br>

Okay, so the first three films in that jumble of HTML are Funny People, What's So Funny About Me? and That Funny Feeling.

From this point, you definitely can poke around and write some better filters to extract the specific information you want. The wrinkle? Like most other sites, Yahoo Movies chops the results into multiple pages, so what you'd really want to do is identify how many pages of results there are going to be and then grab the results from each, one by one. It's tedious, but doable.


Dave Taylor has been hacking shell scripts for over thirty years. Really. He's the author of the popular "Wicked Cool Shell Scripts" and can be found on Twitter as @DaveTaylor and more generally at

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState