Work the Shell - Movie Trivia and Fun with Random Numbers
Last month, we had a lot of fun digging around within the Internet Movie Database, producing a set of scripts that together make it easy to generate a list of the top 250 movies on the site with release dates. The format of the output is:
All About Eve | 1950 Hotel Rwanda | 2004 Sin City | 2005 City Lights | 1931
This month, I take a look at how you can break those two fields up and randomly generate some likely release dates close to the actual date, then send it as a question on Twitter. For example, it might ask, “Hotel Rwanda was released in: 2000, 2001, 2004 or 2007?”
Okay, this should be super easy for anyone reading this column. There are a bunch of ways to take a two-field data record and split it up, but my favorite tool for this sort of task is cut. So, we can do this:
moviename="$(echo $entry | cut -d\| -f1)" releasedate=$(echo $entry | cut -d\| -f2)"
That was easy, right? Now, of course, if you want to be fancy about it, you'll want to strip any leading or trailing spaces too, which can be done with this sed command:
sed 's/^ //g;s/ $//g'
But, how do you get a random line out of a text file?
If you recall from previous columns, one of the secret features of the Bash shell's built-in mathematical capabilities—accessible with $(( )) notation—is the ability to get a random integer without any further fuss, like this:
echo $(( $RANDOM ))
Try it in your own command shell a few times, and you'll get a series of random integer values, like 29408 and 17501. To constrain it to the size of the file, we could do something fancy with wc -l to identify the number of lines in the actual data file, but because we already know we're grabbing 250 film titles from IMDb, it's easy just to use that value. Here's the first stab:
pickline="$(( $RANDOM % 250 )) "
It's not quite right though, because we'll get values 0–254. You can verify this by entering the command echo $(( 5 % 5 )), for example. So, we need to shift things up one:
pickline="$(expr $(( $RANDOM % 250 )) + 1 )"
That produces a random number. To extract that value from a file of lines, there are a number of solutions, but I'll stick with sed. In that case, the solution for pulling out line 33, as an example, is:
sed -n 33p
If you change the value to a variable name, however, there's a problem:
sed -n $picklinep
You can't put a space between the variable name and the p, but if you don't, you have a bad variable name, because it's pickline, not picklinep. The solution is a secret notational convention you can use in scripts when there's any sort of ambiguity like this—curly brackets. So, the line ends up as follows:
sed -n ${pickline}p
That does the trick, and in an application like this, sed is lightning fast too.
At this point, we have a data file of interesting information, we can extract a random line from the file, and we can split the resultant data into the film title and release year. How about coming up with plausible alternative release years?
My first inclination with generating random years was to add and subtract 1–3 years and then use those as the alternate values. If we were looking at, say, Shaun of the Dead, released in 2004, we might end up with 2001 and 2007 as the options. Match a film that's more recent though, such as 2007's Grindhouse (though why that's on the IMDb top 250 films list is beyond me), and we have a problem. Suggesting 2009 as a possible release date would be daft.
More important, it wouldn't take long for people to realize that it's the middle value that's always correct on the quiz—not good. Just like with the SAT and GMAT, it's important to avoid any possible patterns in answers.
As a result, we can try something a bit more complicated. Each possible year is the actual year of release plus or minus a random value of 1–5—close enough that it'll be challenging to remember the right year.
Here's the beginning of the script:
add="$(( $RANDOM % 2 ))" delta="$(expr $(( $RANDOM % 5 )) + 1)"
Here, add will be 0 (false) or 1 (true) for later conditional testing, and delta is a value between one and five, just as we need. They can be applied as follows:
if [ $add -eq 1 ] ; then newvalue=$(expr $1 + $delta ) else newvalue=$(expr $1 - $delta ) fi
This script can be tested easily by dropping it into a simple script, which I'll call random-years.sh. The result of applying this to the starting year 2000 is 2002, 1998, 2005, 2001, 2003, 2004. Seems sufficiently random, yes?
Now, let's consider some nuances. First, we need to ensure that it's never past the current year, which can be done by grabbing that value from the date command with a format string: date +%Y (learn more about the many, many format strings that the date command understands with man strftime).
Second, here's a more interesting thought. If the movie came out a long time ago, we should have a bigger delta than if it's a recent release. In other words, if the movie is Casablanca, it came out in 1942, 66 years ago. Iron Man, which is also on the top 250 list, came out in 2008, 0 years ago. For Casablanca, we could have possible values of 1938 and even 1951, and it'd be a good quiz question for anyone who isn't a complete film nut. But, that far of a spread for Iron Man makes no sense. No one's going to think it might have come out in 1999.
What I'm thinking about in this situation then is that the delta might be a percentage of the age of the movie, normalized so that we always have some sort of spread. Maybe 20%? That'd give us a delta of 13.2 for Casablanca and 0 for Iron Man. That could work.
Ah, but I've run out of space. Next month, we'll go back to the random adjacent year function to wrap it up, and then look at how to get these questions out on Twitter rather than just on the Linux command line. Until then, “here's lookin' at you, kid.”
Dave Taylor is a 26-year veteran of UNIX, creator of The Elm Mail System, and most recently author of both the best-selling Wicked Cool Shell Scripts and Teach Yourself Unix in 24 Hours, among his 16 technical books. His main Web site is at www.intuitive.com, and he also offers up tech support at AskDaveTaylor.com. Follow him on Twitter if you'd like: twitter.com/DaveTaylor.
Dave Taylor has been hacking shell scripts for over thirty years. Really. He's the author of the popular "Wicked Cool Shell Scripts" and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- RSS Feeds
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Designing Electronics with Linux
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- A Topic for Discussion - Open Source Feature-Richness?
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Validate an E-Mail Address with PHP, the Right Way
- What's the tweeting protocol?
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




4 hours 38 min ago
9 hours 5 min ago
12 hours 41 min ago
13 hours 13 min ago
15 hours 37 min ago
15 hours 40 min ago
15 hours 41 min ago
20 hours 6 min ago
21 hours 57 min ago
1 day 3 hours ago