Work the Shell - Parsing Your Twitter Stream
Last month, we circled back to Twitter and started developing a shell script that lets you actually parse and respond to queries sent via Twitter. The idea was that if you were a store, for example, a tweet of “hours?” could be answered automatically with a response tweet of the store's hours—simple, but interesting nonetheless.
We ended last month with a script that does quite a bit in just a few lines:
#!/bin/sh
curl="/usr/bin/curl -s"
inurl="http://www.twitter.com/statuses/mentions.xml"
pw='PasswordGoesHere'
temp="/tmp/$(basename $0).$$"
trap "/bin/rm -f $temp" 0 1 9 15 # axe our temp file
$curl -u "davetaylor:$pw" $inurl | \
grep -E '(<screen_name>|<text>)' | \
sed 's/@DaveTaylor //;s/ <text>//;s/<\/text>//' | \
sed 's/ <screen_name>//;s/<\/screen_name>//' | \
awk '{if (NR % 2 == 1) { printf ("msg=\"%s\"; ",$0) }
else { print "id="$0 }}' > $temp
while read buffer
do
eval $buffer
echo Twitter user @$id sent message $msg
done < $temp
exit 0
(Unfortunately, it has to have the Twitter account password hard-coded, which I've obviously redacted here. You can see where I have “davetaylor” appear and can tweak this to match your own Twitter account.)
This is a pretty tricky script, if I say so myself. Here you can see that we unwrap the XML sent by Twitter and use a complicated sequence of grep/sed/awk to turn it into two name=value pairs, instantiating msg and id.
When I run the script, I see:
Twitter user @TedWahler sent message That sounds like a very interesting article. When and where can I read "When Not To Identify your Group Memberships" Dave? Twitter user @naomimimi sent message i will send you some of my amazing restedness after sleeping for 20 hours yesterday. *bzzzt* feel better? :) Twitter user @GaryBloomer sent message RE: Song. Dave, don't know if you have an answer yet, but: Supertramp: If Everyone Was Listening
A tiny tweak can show who sends you tweets (these are actually @ replies, which is what makes this work): simply change the echo in the final loop to just echo $id.
Want to find those shortened URLs and compile a list? That's a tiny bit more tricky, but you can use tr and grep to do the heavy lifting:
$ sh tweet-listen.sh | tr ' ' '\ > ' | grep 'http://' http://twurl.nl/bco8tq http://twurl.nl/bco8tq http://bit.ly/12PvjV
Hey, someone must have retweeted or something for the same URL to show up twice!
What we want to do though is look for a specific pattern within the stream, so let's do that instead.
The easy way is to change the while read buffer loop to do the parsing:
while read buffer
do
eval $buffer
if [ "$msg" == "hours" ] ; then
echo "Twitter user @$id asked what our hours are"
elif [ "$msg" = "address" ] ; then
echo "Twitter user @$id asked for our address"
# else
# echo Twitter user @$id sent message $msg
fi
done < $temp
Armed with that (and with some cooperative Twitter pals), I can now run the script and find out that:
Twitter user @MommyBrain asked for our address Twitter user @lizhamilton asked what our hours are Twitter user @valdezign asked what our hours are Twitter user @bgindra asked what our hours are Twitter user @MommyBrain asked what our hours are
Coolness, eh? Now, let's answer.
From an earlier column “Pushing Your Message Out to Twitter” in the November 2008 issue of LJ (www.linuxjournal.com/article/10222), we have a script already lying around that lets you specify what message you'd like to send out on Twitter, so it's just a matter of assembling it properly:
while read buffer
do
eval $buffer
if [ "$msg" == "hours" ] ; then
echo "Twitter user @$id asked what our hours are"
$tweet "@$id our hours are Mon-Fri 9-5, Sat 10-4."
elif [ "$msg" = "address" ] ; then
echo "Twitter user @$id asked for our address"
$tweet "@$id we're at 123 University Avenue, Anywhere USA"
fi
done < $temp
In this instance, I'll repeat the earlier tweet script because it's both so succinct and so darn useful:
#!/bin/sh
# Twitter command line interface
user="DaveTaylor" ; pass='PasswordGoesHere'
curl="/usr/bin/curl"
$curl --silent --user "$user:$pass" --data-ascii \
"status=$(echo $@ | tr ' ' '+')" \
"http://twitter.com/statuses/update.json" > /dev/null
echo "(sent tweet $@)"
exit 0
The problem is a bit more complex than we've addressed so far, because when I asked people to send one-word queries, I also got things like “directions” and directions! rather than just the word by itself, unadorned by punctuation, quotation marks and so on.
This is something we'll need to deal with in the script, so we'll want to scrub the msg value to be just alphanumeric (or just alphabetic, if our set of canned response queries never includes a digit). This can be done with tr again, immediately after the eval $buffer statement:
msg="$(echo $msg | tr -cd '[:alpha:]')"
That's not quite right. When we get “directions”, it's actually with the quotes escaped by HTML so they're " rather than just the " symbol. The result? quotdirectionsquot. Not good.
Just like so much in the world of programming, things aren't as easy as you'd like them to be. Instead, we're going to have to strip out quotes manually as part of the scrubbing process. Now it looks like this:
msg="$(echo $msg | sed 's/\"//g' | tr -cd '[:alpha:]')"
It's a bit more complicated, but not terribly so.
The bigger issue is recognizing when we've already responded to a Twitter query to the bot. I'm sure no one's going to appreciate it if a query for “hours?” results in an answer every ten minutes for the next two weeks!
There are two ways to address that particular problem, one of which is to add timestamps to each tweet and figure out when we last auto-responded, but that sounds suspiciously like work. Instead, we simply can remember the most recent tweet to which we responded, including user ID, and use that as the starting point for subsequent auto-response parsing efforts.
I can't squeeze it in this month, but rest assured that next month we'll add this third piece and then talk about how to slip it into a cron job so that every N minutes our Twitter response bot answers any pending queries from the twitterverse.
Dave Taylor has been hacking shell scripts for over thirty years. Really. He's the author of the popular "Wicked Cool Shell Scripts" and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- RSS Feeds
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- Developer Poll
- Dart: a New Web Programming Experience
- May 2013 Issue of Linux Journal: Raspberry Pi
- What's the tweeting protocol?
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.




3 hours 19 min ago
4 hours 6 min ago
5 hours 40 min ago
7 hours 16 min ago
9 hours 14 min ago
9 hours 31 min ago
10 hours 1 min ago
10 hours 2 min ago
10 hours 3 min ago
13 hours 3 min ago