Work the Shell - Our Twitter Autoresponder Goes Live!
I can't believe it, this is my 52nd column. That means I've been writing for Linux Journal for almost four and a half years. Hopefully, you've been reading my column just as long and enjoying our monthly forays into the world of shell script programming. On the tech side, quite a bit has changed in the last four and a half years. But on the Linux/shell side, it's surprisingly similar to how it was when I wrote my first column.
Last month, we continued to build a Twitter autoresponder script that could read and parse Twitter messages (aka tweets). We got it working and wrapped up the column by realizing we actually needed to capture the unique tweet ID in addition to name and message, so we could ensure that the script kept track of what it had or hadn't answered.
The script keeps track of tweets by ID and knows both how to parse the incoming Twitter stream and how to remember if it has seen a one-word tweet request or not. Run it once, and I see:
Twitter user @jlight asked for the time @jlight the time on our server is LOCALTIME
The next time I run it, just a few minutes later, I see:
Twitter user @truss asked for the time @truss the time on our server is LOCALTIME Twitter user @tlady asked what our address in tweet 7395272164 @tlady we're located at 123 University Avenue, Anywhere USA
It looks good, but there's a problem in the script, because one of the output diagnostic lines is:
Twitter user @ asked for the time @ the time on our server is LOCALTIME
Somehow it's not identifying the user ID for this particular user. After a quick analysis of the actual Twitter.com data, it appears that the first tweet comes out of the parser section without an associated user ID.
To debug this, first get a copy of the script to follow along (the script from last month is at ftp.linuxjournal.com/pub/lj/listings/issue191/10695.tgz). In the while loop, I'll add this line to aid in debugging:
echo got name = $name, id = $id, and msg = $msg
Now when I run the script, here's what I see:
got name = , id = 7395437583, and msg = VERY cool got name = spin, id = 7395333666, and msg = time got name = astrong, id = 7395281516, and msg = time got name = truss, id = 7395281011, and msg = time
Clearly something's wrong, but what?
One reason I like to use temp files in scripts rather than having incredibly long and complicated pipes is for debugging this sort of problem.
Recall that the main parsing work is done by curl feeding its output to grep, then a sequence of sed invocations and finally a quick call to awk:
$curl -u "davetaylor:$pw" $inurl | \
grep -E '(<screen_name>|<text>|<id>)' | \
sed 's/@DaveTaylor //;s/ <text>//;s/<\/text>//' | \
sed 's/ *<screen_name>//;s/<\/screen_name>//' | \
sed 's/ *<id>//;s/<\/id>//' | \
awk '{ if (NR % 4 == 0) {
printf ("name=%s; ", $0)
}
else if (NR % 4 == 1) {
printf ("id=%s; ",$0)
}
else if (NR % 4 == 2) {
print "msg=\"" $0 "\""
}
}' > $temp
Adding the command more $temp immediately after this means we can eyeball the data stream and see what's different about the first and second lines (as the second is parsed properly). Here's what I see:
id=7395681235; msg="African or European?" name=jeffrey; id=7395672894; msg="North Hall IStage"
Note that there's no name= field on the first message. My theory? There's a logic error in the awk statement that's causing it to skip the first entry somehow.
To test that assumption, I'll temporarily replace the entire awk script with another that outputs the record number (mod 4) followed by the data line:
awk '{ print (NR % 4), $0 }' > $temp
The result is exactly what we were expecting, which is a bit confusing:
1 7395934047 2 we are at the MGM as well! 3 14171725 0 sideline 1 7395681235 2 African or European? 3 14712874 0 jeffrey
Here, Twitter user sideline has sent “we are at the MGM as well!”, and jeffrey sent the message “African or European?”.
Dave Taylor has been hacking shell scripts for over thirty years. Really. He's the author of the popular "Wicked Cool Shell Scripts" and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Designing Electronics with Linux
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- Build a Skype Server for Your Home Phone System
- Validate an E-Mail Address with PHP, the Right Way
- A Topic for Discussion - Open Source Feature-Richness?
- Why Python?
- Tech Tip: Really Simple HTTP Server with Python
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




1 hour 36 min ago
1 hour 44 min ago
3 hours 59 min ago
6 hours 29 min ago
16 hours 31 min ago
20 hours 58 min ago
1 day 34 min ago
1 day 1 hour ago
1 day 3 hours ago
1 day 3 hours ago