Work the Shell - Our Twitter Autoresponder Goes Live!

 in
Some final fixes to the Twitterbot script we developed last month.

I can't believe it, this is my 52nd column. That means I've been writing for Linux Journal for almost four and a half years. Hopefully, you've been reading my column just as long and enjoying our monthly forays into the world of shell script programming. On the tech side, quite a bit has changed in the last four and a half years. But on the Linux/shell side, it's surprisingly similar to how it was when I wrote my first column.

Last month, we continued to build a Twitter autoresponder script that could read and parse Twitter messages (aka tweets). We got it working and wrapped up the column by realizing we actually needed to capture the unique tweet ID in addition to name and message, so we could ensure that the script kept track of what it had or hadn't answered.

Where We Are Now

The script keeps track of tweets by ID and knows both how to parse the incoming Twitter stream and how to remember if it has seen a one-word tweet request or not. Run it once, and I see:

Twitter user @jlight asked for the time
@jlight the time on our server is LOCALTIME

The next time I run it, just a few minutes later, I see:

Twitter user @truss asked for the time
@truss the time on our server is LOCALTIME
Twitter user @tlady asked what our address in tweet 7395272164
@tlady we're located at 123 University Avenue, Anywhere USA

It looks good, but there's a problem in the script, because one of the output diagnostic lines is:

Twitter user @ asked for the time
@ the time on our server is LOCALTIME

Somehow it's not identifying the user ID for this particular user. After a quick analysis of the actual Twitter.com data, it appears that the first tweet comes out of the parser section without an associated user ID.

To debug this, first get a copy of the script to follow along (the script from last month is at ftp.linuxjournal.com/pub/lj/listings/issue191/10695.tgz). In the while loop, I'll add this line to aid in debugging:

echo got name = $name, id = $id, and msg = $msg

Now when I run the script, here's what I see:

got name = , id = 7395437583, and msg = VERY cool
got name = spin, id = 7395333666, and msg = time
got name = astrong, id = 7395281516, and msg = time
got name = truss, id = 7395281011, and msg = time

Clearly something's wrong, but what?

Debugging a Complicated Script

One reason I like to use temp files in scripts rather than having incredibly long and complicated pipes is for debugging this sort of problem.

Recall that the main parsing work is done by curl feeding its output to grep, then a sequence of sed invocations and finally a quick call to awk:


$curl -u "davetaylor:$pw" $inurl | \
  grep -E '(<screen_name>|<text>|<id>)' | \
  sed 's/@DaveTaylor //;s/  <text>//;s/<\/text>//' | \
  sed 's/ *<screen_name>//;s/<\/screen_name>//' | \
  sed 's/ *<id>//;s/<\/id>//' | \
  awk '{ if (NR % 4 == 0) {
             printf ("name=%s; ", $0)
         }
         else if (NR % 4 == 1) {
             printf ("id=%s; ",$0)
         }
         else if (NR % 4 == 2) {
             print "msg=\"" $0 "\""
         }
       }' > $temp

Adding the command more $temp immediately after this means we can eyeball the data stream and see what's different about the first and second lines (as the second is parsed properly). Here's what I see:

id=7395681235; msg="African or European?"
name=jeffrey; id=7395672894; msg="North Hall IStage"

Note that there's no name= field on the first message. My theory? There's a logic error in the awk statement that's causing it to skip the first entry somehow.

To test that assumption, I'll temporarily replace the entire awk script with another that outputs the record number (mod 4) followed by the data line:

awk '{ print (NR % 4), $0 }' > $temp

The result is exactly what we were expecting, which is a bit confusing:

1 7395934047
2 we are at the MGM as well!
3 14171725
0 sideline
1 7395681235
2 African or European?
3 14712874
0 jeffrey

Here, Twitter user sideline has sent “we are at the MGM as well!”, and jeffrey sent the message “African or European?”.

______________________

Dave Taylor has been hacking shell scripts for over thirty years. Really. He's the author of the popular "Wicked Cool Shell Scripts" and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix