Work the Shell - Converting HTML Forms into Complex Shell Variables
I know, there are a million shell scripts waiting to be written to help administer your computer, run your server and fine-tune your back end, but I'm obsessed with scripts that interact with on-line data, so that's what I'm focusing on. My last column marked the end of our Twitterbot, a simple script that listens and responds to Twitter queries. You can try it by sending an “@” message from your Twitter account to @davesbot.
This month, I thought that given the issue's Entertainment theme, it'd be fun to dig into another facet of shell scripts that interact with the Web by looking at how to emulate a complex form. The form we'll emulate? Yahoo Movies advanced search.
Start by checking out Figure 1 (it shows the form). You can see it live by going to movies.yahoo.com/mv/advsearch too.
We can crack open the HTML and read through the source, but I think it's more interesting to reverse-engineer it, because, like most search forms, this one uses the GET method and, therefore, exposes all of its parameters within the URL of the results page. For example, a search for the title “Strangelove”, without any other tweaks, produces the URL below. Normally, this URL would be all on one line, but I've separated the URL and the parameters onto multiple lines to make them a bit easier to see:
http://movies.yahoo.com/mv/search
?p=strangelove
&yr=all
&gen=all
&syn=
&syn_match=all
&type=feature
&adv=y
The search engine itself is at the URL shown in the first line of the listing above. The rest of the lines are parameters sent to the search engine. You can see that the search term is “p” (“p=strangelove”). You can infer the other parameters by looking at the form: yr = release decade, gen = genre, syn = synopsis keywords and so on.
Because there are so many possible values, however, we're going to have to look at the source after all. For example, those genres? Here's how Yahoo Movies breaks it down:
act = Action/Adventure
ada = Adaptation
ani = Animation
... (lots of entries skipped for space)
tee = Teen
thr = Thriller
war = War
wes = Western
It's quite a list, really!
The question is, can we turn a form of this nature into a simple interactive shell script that will let users specify constraints on a search and pop open a Web browser with the resultant search? Of course we can!
It would be cool to normalize the problem and come up with a general-purpose solution, some sort of parser that would take HTML form tags as input and produce shell script segments as output. Uh, no thanks.
Instead, with a few hacks in vi (yeah, I don't use Emacs), I have the following, as part of a usage() function:
usage()
{
cat << EOF
USAGE: findmovie -g genre -k keywords -nrst title
Where
-n only match those that have news or features
-r only match those with reviews
-s only match those that have showtimes
-t only match those that have trailers
and genre can be one of:
act (Action/Adventure), ada (Adaptation), ani (Animation),
...
tee (Teen), thr (Thriller), war (War) or wes (Western).
EOF
}
This makes life easy and pushes the trick of remembering the three-letter abbreviation for the genre onto the user. Sneaky, eh? Now, to be fair, good interface design would have me writing a more sophisticated script that lets users enter a variety of abbreviations (or the full word) and converts them into the proper Yahoo-approved abbreviation, but that's actually work, so we'll skip that too, okay?
Now, note the actual usage I've created:
USAGE: findmovie -g genre -k keywords -nrst title
This means there are a couple elements of the form that we are going to ignore in the script, including which decade the film was released and some of the more obscure conditional parameters. Still, it's enough to keep us busy.
I've talked about the splendid getopts within shell scripts before, without which parsing the six parameters—two of which have arguments, four of which don't—would be a huge hassle. Instead, this is straightforward. Here are the first few lines to give you the idea:
while getopts "g:k:nrst" arg
do
case "$arg" in
g) params="${params:+$params&}gen=$OPTARG" ;;
There's a lot to talk about here, but we have covered getopts before, and you can <cough> check the man page too, right? In a nutshell though, a letter with a trailing colon means it has a required parameter, so g and k have arguments (g:k:), while n, r, s and t do not (nrst).
The params expansion is a nifty little shell trick that's worth a special mention too. The notation ${params:+$params } expands to the value of the $params variable, plus a trailing space, if the variable already has a value. Otherwise, it's the null string. The point? To avoid leading ampersands in the URL that we're building.
Let's have a quick peek:
$ findmovie.sh -g war -k peace -r finished. params = gen=war&syn=peace&revs=1
As we'd hope, the params variable has been expanded to reflect the specific values that the user has specified on the command line—in this case, War films that have reviews and contain the word “peace” in the synopsis.
Dave Taylor has been hacking shell scripts for over thirty years. Really. He's the author of the popular "Wicked Cool Shell Scripts" and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- New Products
- Linux Systems Administrator
- Senior Perl Developer
- UX Designer
- Technical Support Rep
- Web & UI Developer (JavaScript & j Query)
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- Reply to comment | Linux Journal
10 hours 4 min ago - Nice article, thanks for the
20 hours 45 min ago - I once had a better way I
1 day 2 hours ago - Not only you I too assumed
1 day 2 hours ago - another very interesting
1 day 4 hours ago - Reply to comment | Linux Journal
1 day 6 hours ago - Reply to comment | Linux Journal
1 day 13 hours ago - Reply to comment | Linux Journal
1 day 13 hours ago - Favorite (and easily brute-forced) pw's
1 day 15 hours ago - Have you tried Boxen? It's a
1 day 21 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?





Comments
A couple of points
Nice work, but I have a couple of quibbles:
1. vi vs Emacs detracts from the article - it's largely irrelevant
2. You use params="${params:+$params&}..." to keep off the leading '&' if params hasn't been defined yet, but then you stick params to a string (baseurl) that has a trailing '&', so it's unnecessary. Why not just use params=$params&... and leave the '&' off the end of baseurl? Or stick $params to the end, like so:
params="gen=$OPTARG&$params"The url will have an extra '&' at the end but that's valid.
I recognize that the point is to teach these kinds of things, but to add them when not necessary is generally bad style.
3. Aside from that, in the paragraph where you explain the ${:+} notation you use a space character to illustrate it but it's probably not the best choice. Better to use a more visible character.