Work the Shell - Spreading Out Numbers
The past few months, we've been writing a movie trivia game with the intent of having it be a Twitter client and sporadically spit out questions on its Twitter feed of the form “The film Sunset Blvd. was released in 1943, 1946, or 1950?”
What initially seemed like the most difficult task, finding the list of films and then extracting release dates, turned out to be a manageable one through the expedient of utilizing the terrific Internet Movie Database site (imdb.com) and pushing the data through some filters and transformations.
The end result is that with a simple invocation of a script, we can generate a data file called top-250-films-with-release-dates.db that looks like this: “Sunset Blvd. | 1950” (and now you know the answer to the question in paragraph one).
Last column left off with the puzzle of generating good “adjacent” release years. That is, if we're talking about a movie like Prince Caspian, released in 2008, we want the adjacent values to be quite close—maybe 2005 and 2007. If we're talking about Rear Window, released back in 1954, we want the adjacent values to be spread out more, because offering up 1951, 1954 and 1955 is going to be more annoying and nit-picking than 1940, 1950 and 1954 or similar. See what I mean?
What we could do is simply subtract the release year from the current year, then apply some sort of multiple to tweak the delta. Then, Prince Caspian would have an “adjacency” of zero, and Rear Window would have one of 54. Let's consider dividing the value by five and using the ceiling value to see what the calculation for a half-dozen movies produces (Table 1).
Table 1. Calculating Adjacency for the Movie Trivia Game
| Title | Release Date | Adjacency | Factor |
|---|---|---|---|
| Der Untergang | 2004 | 4 | 1 |
| Metropolis | 1927 | 81 | 17 |
| Sin City | 2005 | 3 | 1 |
| Chinatown | 1974 | 34 | 7 |
| Some Like It Hot | 1959 | 49 | 10 |
That's not bad. Sin City could have incorrect year values within one year of the actual release, while Metropolis could be off by as much as 17 without most people realizing. I mean, if I asked you right now, “Did Fritz Lang's masterwork Metropolis come out in 1927, 1931 or 1947?”, would you know the answer?
This leads to an important realization: we can't have the values perfectly spaced out, so the Factor above is the upper range of a 1..Factor choice. So, the amusing Some Like It Hot can have incorrect guesses that are anywhere from one year to nine years off.
Okay, enough discussion. How do we implement this in code?
Well, we have the release date of the movie in releasedate, and we have the current year in thisyear, so here's a simple test script:
thisyear="$(date +%Y)" releasedate="$1" adjacency="$(( $thisyear - $releasedate ))" if [ $adjacency -lt 5 ] ; then factor="1" else factor="$(( $adjacency / 5 + 1 ))" fi echo "For release $releasedate we have factor = $factor"
This demonstrates an important facet of shell scripting: sometimes thinking through the solution is more time consuming than actually coding your resultant algorithm. I could share an anecdote about my boss telling me to “stop thinking and start coding” in one of my earlier jobs, but I'll skip it. Just keep in mind that thinking through solution paths is a critical step in any job.
Now that we have a way to calculate our adjacency factor for a given movie release year, let's take the next step and actually calculate possible values:
delta="$(( $RANDOM % $factor + 1))" add="$(( $RANDOM % 2 ))" if [ $add -eq 1 ] ; then closeyear="$(( $releasedate + $delta ))" else closeyear="$(( $releasedate - $delta ))" fi
That isn't too bad as a first step.
There are two problems I see with this algorithm as is, however. First, we can end up with release years in the future (that is, Iron Man could end up with a release year of 2009, which is wrong). Second, for movies released in the last five years, we also could end up with the actual release year always sandwiched in the middle once we de-dupe the results. (I hope you can see why that's the case.)
To fix the first problem, we need to add a test to ensure that the closeyear is never greater than thisyear, which is straightforward. For the second problem, I think that having a minimum delta of two, rather than one, gives us a bit more wiggle space, though any movie released in the current year is basically a gimme anyway for people who are paying even minimal attention.
Here's how I implemented these tweaks:
if [ $adjacency -lt 5 ] ; then factor="2" else factor="$(( $adjacency / 5 + 1 ))" fi
And, a bit later in the code:
if [ $closeyear -gt $thisyear ] ; then closeyear="$(( $releasedate - $delta ))" fi
That seems to work pretty well. Now when we give the script a few different release years, here's what we see:
Release Year First Five Generated Results 1962 1970, 1967, 1958, 1960, 1971 1994 1996, 1996, 1995, 1993, 1993 2002 2004, 2001, 2000, 2001, 2003 1927 1915, 1925, 1937, 1936, 1911 2008 2006, 2007, 2007, 2006, 2007
I think we can live with this—not bad at all, actually.
Now we have all the building blocks, and next month, we'll put them all together and create the movie trivia game. With luck, we'll have space to start pushing it out on Twitter too. In the meantime, if you want to sign up on Twitter for the game and watch as I develop it, follow FilmBuzz.
Dave Taylor is a 26-year veteran of UNIX, creator of The Elm Mail System, and most recently author of both the best-selling Wicked Cool Shell Scripts and Teach Yourself Unix in 24 Hours, among his 16 technical books. His main Web site is at www.intuitive.com, and he also offers up tech support at AskDaveTaylor.com. Follow him on Twitter if you'd like: twitter.com/DaveTaylor.
Dave Taylor has been hacking shell scripts for over thirty years. Really. He's the author of the popular "Wicked Cool Shell Scripts" and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- New Products
- Linux Systems Administrator
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




3 hours 29 min ago
9 hours 15 min ago
9 hours 33 min ago
11 hours 26 min ago
13 hours 19 min ago
20 hours 13 min ago
20 hours 29 min ago
22 hours 20 min ago
1 day 4 hours ago
1 day 8 hours ago