Work the Shell - Converting HTML Forms into Complex Shell Variables

 in
Web browser? We don't need no stinkin' Web browser for submitting HTML forms, that's what the shell is for.
Building the Full URL

There's a hiccup waiting to bite us with the code in its current state though. The problem is, what if the user specifies two words in the keywords value field or, worse, does so in the title field (remember, the last word or words are the title pattern, the core search for the Yahoo Movies system)?

The answer is that we need to convert spaces into symbols that are acceptable by the http system. That's easily done, fortunately:

params="$(echo $params | sed 's/ /+/g')"

It's not the most elegant solution, but it's certainly functional!

The bigger problem here is that Yahoo requires certain parameters actually be present to do a search. Choose a genre on the Web interface and click search, and you'll see that's not sufficient for it to proceed.

As a result, our base URL for searches is going to be a bit more complicated:


baseurl="http://movies.yahoo.com/mv/search"
baseurl="${baseurl}?yr=all&syn_match=all&"

Try that, and you'll find it doesn't work. Why? Because there are some hidden parameters that Yahoo has slipped into the form that are required to send to the search program. Without them, it just stops.

In fact, here's the baseurl value we need:


baseurl="http://movies.yahoo.com/mv/search"
baseurl="${baseurl}?yr=all&syn_match=all&adv=y&type=feature&"

Now, how do we put this all together? It's not so easy, because we still need to grab whatever's on the end of the invocation (the title pattern), then mask the spaces:

shift $(( $OPTIND - 1 ))

Hang on, let me explain this line before we go further. OPTIND contains the index into the positional parameters of the script, indicating the first parameter that wasn't absorbed by the getopts processing. Unfortunately, it's indexed from 1, and the options array is indexed starting at zero. The result? We have to subtract one from the value to be able to get the actual value with the $* notation:


params="$(echo $params | sed 's/ /+/g')"

pattern="$(echo $* | sed 's/ /+/g')"
echo URL: $baseurl${params}\&p=$pattern

Now, finally, armed with that, we can search for films that contain the word “love” and have reviews:


$ findmovie.sh -r love

URL: ...BASEURL...revs=1&p=love

Type that in, and you'll find it works fine, showing 80 films where “love” appears in the title and Yahoo Movies is aware of on-line reviews of the films.

Most Linuxes and other flavors of UNIX have a way that you can launch a Web browser from the command line, with the specified URL as its home. That's what we'll do:


echo $baseurl${params}\&p=$pattern
open -a safari "$baseurl${params}\&p=$pattern"

There are other things we can do now that we've converted the Yahoo advanced search form into a shell script, but we'll leave those for next month!

Dave Taylor has been hacking shell scripts for a really long time, 30 years. He's the author of the popular Wicked Cool Shell Scripts and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.

______________________

Dave Taylor has been hacking shell scripts for over thirty years. Really. He's the author of the popular "Wicked Cool Shell Scripts" and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

A couple of points

ciotog's picture

Nice work, but I have a couple of quibbles:

1. vi vs Emacs detracts from the article - it's largely irrelevant

2. You use params="${params:+$params&}..." to keep off the leading '&' if params hasn't been defined yet, but then you stick params to a string (baseurl) that has a trailing '&', so it's unnecessary. Why not just use params=$params&... and leave the '&' off the end of baseurl? Or stick $params to the end, like so:

params="gen=$OPTARG&$params"
The url will have an extra '&' at the end but that's valid.
I recognize that the point is to teach these kinds of things, but to add them when not necessary is generally bad style.

3. Aside from that, in the paragraph where you explain the ${:+} notation you use a space character to illustrate it but it's probably not the best choice. Better to use a more visible character.

Geek Guide
The DevOps Toolbox

Tools and Technologies for Scale and Reliability
by Linux Journal Editor Bill Childers

Get your free copy today

Sponsored by IBM

Webcast
8 Signs You're Beyond Cron

Scheduling Crontabs With an Enterprise Scheduler
On Demand
Moderated by Linux Journal Contributor Mike Diehl

Sign up and watch now

Sponsored by Skybot