Work the Shell - Converting HTML Forms into Complex Shell Variables

 in
Web browser? We don't need no stinkin' Web browser for submitting HTML forms, that's what the shell is for.
Building the Full URL

There's a hiccup waiting to bite us with the code in its current state though. The problem is, what if the user specifies two words in the keywords value field or, worse, does so in the title field (remember, the last word or words are the title pattern, the core search for the Yahoo Movies system)?

The answer is that we need to convert spaces into symbols that are acceptable by the http system. That's easily done, fortunately:

params="$(echo $params | sed 's/ /+/g')"

It's not the most elegant solution, but it's certainly functional!

The bigger problem here is that Yahoo requires certain parameters actually be present to do a search. Choose a genre on the Web interface and click search, and you'll see that's not sufficient for it to proceed.

As a result, our base URL for searches is going to be a bit more complicated:


baseurl="http://movies.yahoo.com/mv/search"
baseurl="${baseurl}?yr=all&syn_match=all&"

Try that, and you'll find it doesn't work. Why? Because there are some hidden parameters that Yahoo has slipped into the form that are required to send to the search program. Without them, it just stops.

In fact, here's the baseurl value we need:


baseurl="http://movies.yahoo.com/mv/search"
baseurl="${baseurl}?yr=all&syn_match=all&adv=y&type=feature&"

Now, how do we put this all together? It's not so easy, because we still need to grab whatever's on the end of the invocation (the title pattern), then mask the spaces:

shift $(( $OPTIND - 1 ))

Hang on, let me explain this line before we go further. OPTIND contains the index into the positional parameters of the script, indicating the first parameter that wasn't absorbed by the getopts processing. Unfortunately, it's indexed from 1, and the options array is indexed starting at zero. The result? We have to subtract one from the value to be able to get the actual value with the $* notation:


params="$(echo $params | sed 's/ /+/g')"

pattern="$(echo $* | sed 's/ /+/g')"
echo URL: $baseurl${params}\&p=$pattern

Now, finally, armed with that, we can search for films that contain the word “love” and have reviews:


$ findmovie.sh -r love

URL: ...BASEURL...revs=1&p=love

Type that in, and you'll find it works fine, showing 80 films where “love” appears in the title and Yahoo Movies is aware of on-line reviews of the films.

Most Linuxes and other flavors of UNIX have a way that you can launch a Web browser from the command line, with the specified URL as its home. That's what we'll do:


echo $baseurl${params}\&p=$pattern
open -a safari "$baseurl${params}\&p=$pattern"

There are other things we can do now that we've converted the Yahoo advanced search form into a shell script, but we'll leave those for next month!

Dave Taylor has been hacking shell scripts for a really long time, 30 years. He's the author of the popular Wicked Cool Shell Scripts and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.

______________________

Dave Taylor has been hacking shell scripts for over thirty years. Really. He's the author of the popular "Wicked Cool Shell Scripts" and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

A couple of points

ciotog's picture

Nice work, but I have a couple of quibbles:

1. vi vs Emacs detracts from the article - it's largely irrelevant

2. You use params="${params:+$params&}..." to keep off the leading '&' if params hasn't been defined yet, but then you stick params to a string (baseurl) that has a trailing '&', so it's unnecessary. Why not just use params=$params&... and leave the '&' off the end of baseurl? Or stick $params to the end, like so:

params="gen=$OPTARG&$params"
The url will have an extra '&' at the end but that's valid.
I recognize that the point is to teach these kinds of things, but to add them when not necessary is generally bad style.

3. Aside from that, in the paragraph where you explain the ${:+} notation you use a space character to illustrate it but it's probably not the best choice. Better to use a more visible character.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix