Work the Shell - Simple Scripts to Sophisticated HTML Forms
Last month, we looked at how to convert an HTML form on a page into a shell script with command flags and variables that let you have access to all the features of the search box. We tapped into Yahoo Movies and are building a script that offers up the key capabilities on the search form at movies.yahoo.com/mv/advsearch.
The script we built ended up with this usage statement:
USAGE: findmovie -g genre -k keywords -nrst title
So, that gives you an idea of what we're trying to do. Last month, we stopped with a script that offered the capabilities above and could open a Web browser with the result of the search using the open command.
Now, let's start with a caveat: open is a Mac OS X command-line script that lets you launch a GUI app. Just about every other Linux/UNIX flavor has a similar feature, including if you're running the X Window System. In fact, with most of them, it's even easier. A typical Linux version of “open a Web browser with this URL loaded” might be as simple as:
firefox http://www.linuxjournal.com/ &
That's easily done, even in a shell script.
Actually, if you're going to end a script by invoking a specific command, the best way to do it is to “exec” the command, which basically replaces the script with the app you've specified, so it's not still running and doesn't even need to exit. So in that case, it might look like exec firefox "$url" as the last line of the script.
This month, I want to go back and make our script do more interesting things. For now, an invocation like:
./findmovie.sh -g act evil
produces a command from the last few lines in the script:
echo $baseurl${params}\&p=$pattern
exec open -a safari "$baseurl${params}\&p=$pattern"
that ends up pushing out this:
http://movies.yahoo.com/mv/ ↪search?yr=all&syn_match=all&adv=y&type=feature&gen=act&p=evil
It's pretty sophisticated!
What if the user wants the option of dumping the data to the command line instead of launching a browser? We can address that by adding a -d dump command flag into the getopt block:
while getopts "dg:k:nrst" arg
do
case "$arg" in
d ) dump=1 ;;
g ) params="${params:+$params&}gen=$OPTARG" ;;
To dump the data, we'll enlist the powerful curl command, as we've done in the past. The program has zillions of options, but as we're just interested in the raw output, we can ignore them all (fortunately) except for --silent, which hides status updates, leaving the conditional:
if [ $dump -eq 1 ] ; then
exec /usr/bin/curl --silent "$baseurl${params}\&p=$pattern"
else
exec open -a safari "$baseurl${params}\&p=$pattern"
fi
But, that generates a huge amount of data, including all the HTML needed to produce the page in question. Let's spend just a minute looking closely at that output and see if there's a way to trim things at least a bit.
It turns out that every movie title that's matched includes a link to the movie's information on the Yahoo Movies site. Those look like:
<a href="http://movies.yahoo.com/movie/1809697875/info">Resident Evil
So, that's easy to detect. Better, we can use a regex expression with grep and skip a lot of superfluous data too:
cmd | grep '/movie/.*info'
That comes close to having only the lines that match individual movies, but to take this one step further, let's remove the false matches for dvdinfo, because we're not interested in the links to DVD release info. That's a grep -v:
cmd | grep '/movie/.*info' | grep -v dvdinfo
Now, let's have a quick peek at comedies that have the word “funny” in their titles:
./findmovie.sh -d -g com funny | grep '/movie/.*info' ↪| grep -v dvdinfo | head -3 <td><a href="http://movies.yahoo.com/movie/1810041785/info"> <b>Funny</b> People (2009)</a><br> <td><a href="http://movies.yahoo.com/movie/1809406735/info">What's So <b>Funny</b> About Me? (1997)</a><br> <td><a href="http://movies.yahoo.com/movie/1808565885/info">That <b>Funny</b> Feeling (1965)</a><br>
Okay, so the first three films in that jumble of HTML are Funny People, What's So Funny About Me? and That Funny Feeling.
From this point, you definitely can poke around and write some better filters to extract the specific information you want. The wrinkle? Like most other sites, Yahoo Movies chops the results into multiple pages, so what you'd really want to do is identify how many pages of results there are going to be and then grab the results from each, one by one. It's tedious, but doable.
Dave Taylor has been hacking shell scripts for over thirty years. Really. He's the author of the popular "Wicked Cool Shell Scripts" and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- A Topic for Discussion - Open Source Feature-Richness?
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Home, My Backup Data Center
- One Hand Slapping
- What's the tweeting protocol?
- Trying to Tame the Tablet
- RSS Feeds
- Developer Poll
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.




5 hours 38 min ago
8 hours 11 min ago
9 hours 28 min ago
10 hours 3 min ago
10 hours 25 min ago
15 hours 14 min ago
16 hours 1 min ago
17 hours 35 min ago
19 hours 11 min ago
21 hours 9 min ago