Work the Shell - Parsing Your Twitter Stream

 in
More work on the Twitter response bot.

Last month, we circled back to Twitter and started developing a shell script that lets you actually parse and respond to queries sent via Twitter. The idea was that if you were a store, for example, a tweet of “hours?” could be answered automatically with a response tweet of the store's hours—simple, but interesting nonetheless.

We ended last month with a script that does quite a bit in just a few lines:


#!/bin/sh

curl="/usr/bin/curl -s"
inurl="http://www.twitter.com/statuses/mentions.xml"
pw='PasswordGoesHere'
temp="/tmp/$(basename $0).$$"

trap "/bin/rm -f $temp" 0 1 9 15 # axe our temp file

$curl -u "davetaylor:$pw" $inurl | \
    grep -E '(<screen_name>|<text>)' | \
    sed 's/@DaveTaylor //;s/  <text>//;s/<\/text>//' | \
    sed 's/    <screen_name>//;s/<\/screen_name>//' | \
    awk '{if (NR % 2 == 1) { printf ("msg=\"%s\"; ",$0) }
          else             { print "id="$0 }}' > $temp

while read buffer
do
    eval $buffer
    echo Twitter user @$id sent message $msg
done < $temp

exit 0

(Unfortunately, it has to have the Twitter account password hard-coded, which I've obviously redacted here. You can see where I have “davetaylor” appear and can tweak this to match your own Twitter account.)

This is a pretty tricky script, if I say so myself. Here you can see that we unwrap the XML sent by Twitter and use a complicated sequence of grep/sed/awk to turn it into two name=value pairs, instantiating msg and id.

When I run the script, I see:


Twitter user @TedWahler sent message That sounds like a
very interesting article. When and where can I read
&quot;When Not To Identify your Group Memberships&quot; Dave?

Twitter user @naomimimi sent message i will send you some
of my amazing restedness after sleeping for 20 hours
yesterday. *bzzzt* feel better? :)

Twitter user @GaryBloomer sent message RE: Song. Dave,
don't know if you have an answer yet, but: Supertramp:
If Everyone Was Listening

A tiny tweak can show who sends you tweets (these are actually @ replies, which is what makes this work): simply change the echo in the final loop to just echo $id.

Want to find those shortened URLs and compile a list? That's a tiny bit more tricky, but you can use tr and grep to do the heavy lifting:

$ sh tweet-listen.sh | tr ' ' '\
> ' | grep 'http://'

http://twurl.nl/bco8tq
http://twurl.nl/bco8tq
http://bit.ly/12PvjV

Hey, someone must have retweeted or something for the same URL to show up twice!

What we want to do though is look for a specific pattern within the stream, so let's do that instead.

Looking for Patterns

The easy way is to change the while read buffer loop to do the parsing:


while read buffer
do
  eval $buffer
  if [ "$msg" == "hours" ] ; then
    echo "Twitter user @$id asked what our hours are"

  elif [ "$msg" = "address" ] ; then
    echo "Twitter user @$id asked for our address"

  # else
  #   echo Twitter user @$id sent message $msg
  fi
done < $temp

Armed with that (and with some cooperative Twitter pals), I can now run the script and find out that:

Twitter user @MommyBrain asked for our address
Twitter user @lizhamilton asked what our hours are
Twitter user @valdezign asked what our hours are
Twitter user @bgindra asked what our hours are
Twitter user @MommyBrain asked what our hours are

Coolness, eh? Now, let's answer.

Responding to Tweet Queries

From an earlier column “Pushing Your Message Out to Twitter” in the November 2008 issue of LJ (www.linuxjournal.com/article/10222), we have a script already lying around that lets you specify what message you'd like to send out on Twitter, so it's just a matter of assembling it properly:


while read buffer
do
  eval $buffer
  if [ "$msg" == "hours" ] ; then
    echo "Twitter user @$id asked what our hours are"
    $tweet "@$id our hours are Mon-Fri 9-5, Sat 10-4."

  elif [ "$msg" = "address" ] ; then
    echo "Twitter user @$id asked for our address"
    $tweet "@$id we're at 123 University Avenue, Anywhere USA"
  fi
done < $temp

In this instance, I'll repeat the earlier tweet script because it's both so succinct and so darn useful:

#!/bin/sh
# Twitter command line interface

user="DaveTaylor" ; pass='PasswordGoesHere'

curl="/usr/bin/curl"
$curl --silent --user "$user:$pass" --data-ascii \
    "status=$(echo $@ | tr ' ' '+')" \
    "http://twitter.com/statuses/update.json" > /dev/null

echo "(sent tweet $@)"
exit 0

The problem is a bit more complex than we've addressed so far, because when I asked people to send one-word queries, I also got things like “directions” and directions! rather than just the word by itself, unadorned by punctuation, quotation marks and so on.

This is something we'll need to deal with in the script, so we'll want to scrub the msg value to be just alphanumeric (or just alphabetic, if our set of canned response queries never includes a digit). This can be done with tr again, immediately after the eval $buffer statement:

msg="$(echo $msg | tr -cd '[:alpha:]')"

That's not quite right. When we get “directions”, it's actually with the quotes escaped by HTML so they're &quot; rather than just the " symbol. The result? quotdirectionsquot. Not good.

Just like so much in the world of programming, things aren't as easy as you'd like them to be. Instead, we're going to have to strip out quotes manually as part of the scrubbing process. Now it looks like this:


msg="$(echo $msg | sed 's/\&quot;//g' | tr -cd '[:alpha:]')"

It's a bit more complicated, but not terribly so.

The bigger issue is recognizing when we've already responded to a Twitter query to the bot. I'm sure no one's going to appreciate it if a query for “hours?” results in an answer every ten minutes for the next two weeks!

There are two ways to address that particular problem, one of which is to add timestamps to each tweet and figure out when we last auto-responded, but that sounds suspiciously like work. Instead, we simply can remember the most recent tweet to which we responded, including user ID, and use that as the starting point for subsequent auto-response parsing efforts.

I can't squeeze it in this month, but rest assured that next month we'll add this third piece and then talk about how to slip it into a cron job so that every N minutes our Twitter response bot answers any pending queries from the twitterverse.

______________________

Dave Taylor has been hacking shell scripts for over thirty years. Really. He's the author of the popular "Wicked Cool Shell Scripts" and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix