Work the Shell - Converting HTML Forms into Complex Shell Variables
I know, there are a million shell scripts waiting to be written to help administer your computer, run your server and fine-tune your back end, but I'm obsessed with scripts that interact with on-line data, so that's what I'm focusing on. My last column marked the end of our Twitterbot, a simple script that listens and responds to Twitter queries. You can try it by sending an “@” message from your Twitter account to @davesbot.
This month, I thought that given the issue's Entertainment theme, it'd be fun to dig into another facet of shell scripts that interact with the Web by looking at how to emulate a complex form. The form we'll emulate? Yahoo Movies advanced search.
Start by checking out Figure 1 (it shows the form). You can see it live by going to movies.yahoo.com/mv/advsearch too.
We can crack open the HTML and read through the source, but I think it's more interesting to reverse-engineer it, because, like most search forms, this one uses the GET method and, therefore, exposes all of its parameters within the URL of the results page. For example, a search for the title “Strangelove”, without any other tweaks, produces the URL below. Normally, this URL would be all on one line, but I've separated the URL and the parameters onto multiple lines to make them a bit easier to see:
http://movies.yahoo.com/mv/search
?p=strangelove
&yr=all
&gen=all
&syn=
&syn_match=all
&type=feature
&adv=y
The search engine itself is at the URL shown in the first line of the listing above. The rest of the lines are parameters sent to the search engine. You can see that the search term is “p” (“p=strangelove”). You can infer the other parameters by looking at the form: yr = release decade, gen = genre, syn = synopsis keywords and so on.
Because there are so many possible values, however, we're going to have to look at the source after all. For example, those genres? Here's how Yahoo Movies breaks it down:
act = Action/Adventure
ada = Adaptation
ani = Animation
... (lots of entries skipped for space)
tee = Teen
thr = Thriller
war = War
wes = Western
It's quite a list, really!
The question is, can we turn a form of this nature into a simple interactive shell script that will let users specify constraints on a search and pop open a Web browser with the resultant search? Of course we can!
It would be cool to normalize the problem and come up with a general-purpose solution, some sort of parser that would take HTML form tags as input and produce shell script segments as output. Uh, no thanks.
Instead, with a few hacks in vi (yeah, I don't use Emacs), I have the following, as part of a usage() function:
usage()
{
cat << EOF
USAGE: findmovie -g genre -k keywords -nrst title
Where
-n only match those that have news or features
-r only match those with reviews
-s only match those that have showtimes
-t only match those that have trailers
and genre can be one of:
act (Action/Adventure), ada (Adaptation), ani (Animation),
...
tee (Teen), thr (Thriller), war (War) or wes (Western).
EOF
}
This makes life easy and pushes the trick of remembering the three-letter abbreviation for the genre onto the user. Sneaky, eh? Now, to be fair, good interface design would have me writing a more sophisticated script that lets users enter a variety of abbreviations (or the full word) and converts them into the proper Yahoo-approved abbreviation, but that's actually work, so we'll skip that too, okay?
Now, note the actual usage I've created:
USAGE: findmovie -g genre -k keywords -nrst title
This means there are a couple elements of the form that we are going to ignore in the script, including which decade the film was released and some of the more obscure conditional parameters. Still, it's enough to keep us busy.
I've talked about the splendid getopts within shell scripts before, without which parsing the six parameters—two of which have arguments, four of which don't—would be a huge hassle. Instead, this is straightforward. Here are the first few lines to give you the idea:
while getopts "g:k:nrst" arg
do
case "$arg" in
g) params="${params:+$params&}gen=$OPTARG" ;;
There's a lot to talk about here, but we have covered getopts before, and you can <cough> check the man page too, right? In a nutshell though, a letter with a trailing colon means it has a required parameter, so g and k have arguments (g:k:), while n, r, s and t do not (nrst).
The params expansion is a nifty little shell trick that's worth a special mention too. The notation ${params:+$params } expands to the value of the $params variable, plus a trailing space, if the variable already has a value. Otherwise, it's the null string. The point? To avoid leading ampersands in the URL that we're building.
Let's have a quick peek:
$ findmovie.sh -g war -k peace -r finished. params = gen=war&syn=peace&revs=1
As we'd hope, the params variable has been expanded to reflect the specific values that the user has specified on the command line—in this case, War films that have reviews and contain the word “peace” in the synopsis.
Dave Taylor has been hacking shell scripts for over thirty years. Really. He's the author of the popular "Wicked Cool Shell Scripts" and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- RSS Feeds
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- Developer Poll
- May 2013 Issue of Linux Journal: Raspberry Pi
- Dart: a New Web Programming Experience
- What's the tweeting protocol?
- Reply to comment | Linux Journal
46 min 15 sec ago - Web Hosting IQ
2 hours 20 min ago - Thanks for taking the time to
3 hours 56 min ago - Linux is good
5 hours 54 min ago - Reply to comment | Linux Journal
6 hours 11 min ago - Web Hosting IQ
6 hours 41 min ago - Web Hosting IQ
6 hours 42 min ago - Web Hosting IQ
6 hours 42 min ago - Reply to comment | Linux Journal
9 hours 43 min ago - play with linux? i think you mean work-around linux
18 hours 9 min ago
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.





Comments
A couple of points
Nice work, but I have a couple of quibbles:
1. vi vs Emacs detracts from the article - it's largely irrelevant
2. You use params="${params:+$params&}..." to keep off the leading '&' if params hasn't been defined yet, but then you stick params to a string (baseurl) that has a trailing '&', so it's unnecessary. Why not just use params=$params&... and leave the '&' off the end of baseurl? Or stick $params to the end, like so:
params="gen=$OPTARG&$params"The url will have an extra '&' at the end but that's valid.
I recognize that the point is to teach these kinds of things, but to add them when not necessary is generally bad style.
3. Aside from that, in the paragraph where you explain the ${:+} notation you use a space character to illustrate it but it's probably not the best choice. Better to use a more visible character.