Work the Shell - Our Twitter Autoresponder Goes Live!

 in
Some final fixes to the Twitterbot script we developed last month.

I can't believe it, this is my 52nd column. That means I've been writing for Linux Journal for almost four and a half years. Hopefully, you've been reading my column just as long and enjoying our monthly forays into the world of shell script programming. On the tech side, quite a bit has changed in the last four and a half years. But on the Linux/shell side, it's surprisingly similar to how it was when I wrote my first column.

Last month, we continued to build a Twitter autoresponder script that could read and parse Twitter messages (aka tweets). We got it working and wrapped up the column by realizing we actually needed to capture the unique tweet ID in addition to name and message, so we could ensure that the script kept track of what it had or hadn't answered.

Where We Are Now

The script keeps track of tweets by ID and knows both how to parse the incoming Twitter stream and how to remember if it has seen a one-word tweet request or not. Run it once, and I see:

Twitter user @jlight asked for the time
@jlight the time on our server is LOCALTIME

The next time I run it, just a few minutes later, I see:

Twitter user @truss asked for the time
@truss the time on our server is LOCALTIME
Twitter user @tlady asked what our address in tweet 7395272164
@tlady we're located at 123 University Avenue, Anywhere USA

It looks good, but there's a problem in the script, because one of the output diagnostic lines is:

Twitter user @ asked for the time
@ the time on our server is LOCALTIME

Somehow it's not identifying the user ID for this particular user. After a quick analysis of the actual Twitter.com data, it appears that the first tweet comes out of the parser section without an associated user ID.

To debug this, first get a copy of the script to follow along (the script from last month is at ftp.linuxjournal.com/pub/lj/listings/issue191/10695.tgz). In the while loop, I'll add this line to aid in debugging:

echo got name = $name, id = $id, and msg = $msg

Now when I run the script, here's what I see:

got name = , id = 7395437583, and msg = VERY cool
got name = spin, id = 7395333666, and msg = time
got name = astrong, id = 7395281516, and msg = time
got name = truss, id = 7395281011, and msg = time

Clearly something's wrong, but what?

Debugging a Complicated Script

One reason I like to use temp files in scripts rather than having incredibly long and complicated pipes is for debugging this sort of problem.

Recall that the main parsing work is done by curl feeding its output to grep, then a sequence of sed invocations and finally a quick call to awk:


$curl -u "davetaylor:$pw" $inurl | \
  grep -E '(<screen_name>|<text>|<id>)' | \
  sed 's/@DaveTaylor //;s/  <text>//;s/<\/text>//' | \
  sed 's/ *<screen_name>//;s/<\/screen_name>//' | \
  sed 's/ *<id>//;s/<\/id>//' | \
  awk '{ if (NR % 4 == 0) {
             printf ("name=%s; ", $0)
         }
         else if (NR % 4 == 1) {
             printf ("id=%s; ",$0)
         }
         else if (NR % 4 == 2) {
             print "msg=\"" $0 "\""
         }
       }' > $temp

Adding the command more $temp immediately after this means we can eyeball the data stream and see what's different about the first and second lines (as the second is parsed properly). Here's what I see:

id=7395681235; msg="African or European?"
name=jeffrey; id=7395672894; msg="North Hall IStage"

Note that there's no name= field on the first message. My theory? There's a logic error in the awk statement that's causing it to skip the first entry somehow.

To test that assumption, I'll temporarily replace the entire awk script with another that outputs the record number (mod 4) followed by the data line:

awk '{ print (NR % 4), $0 }' > $temp

The result is exactly what we were expecting, which is a bit confusing:

1 7395934047
2 we are at the MGM as well!
3 14171725
0 sideline
1 7395681235
2 African or European?
3 14712874
0 jeffrey

Here, Twitter user sideline has sent “we are at the MGM as well!”, and jeffrey sent the message “African or European?”.

______________________

Dave Taylor has been hacking shell scripts for over thirty years. Really. He's the author of the popular "Wicked Cool Shell Scripts" and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState