Work the Shell - Parsing Your Twitter Stream
Last month, we circled back to Twitter and started developing a shell script that lets you actually parse and respond to queries sent via Twitter. The idea was that if you were a store, for example, a tweet of “hours?” could be answered automatically with a response tweet of the store's hours—simple, but interesting nonetheless.
We ended last month with a script that does quite a bit in just a few lines:
#!/bin/sh
curl="/usr/bin/curl -s"
inurl="http://www.twitter.com/statuses/mentions.xml"
pw='PasswordGoesHere'
temp="/tmp/$(basename $0).$$"
trap "/bin/rm -f $temp" 0 1 9 15 # axe our temp file
$curl -u "davetaylor:$pw" $inurl | \
grep -E '(<screen_name>|<text>)' | \
sed 's/@DaveTaylor //;s/ <text>//;s/<\/text>//' | \
sed 's/ <screen_name>//;s/<\/screen_name>//' | \
awk '{if (NR % 2 == 1) { printf ("msg=\"%s\"; ",$0) }
else { print "id="$0 }}' > $temp
while read buffer
do
eval $buffer
echo Twitter user @$id sent message $msg
done < $temp
exit 0
(Unfortunately, it has to have the Twitter account password hard-coded, which I've obviously redacted here. You can see where I have “davetaylor” appear and can tweak this to match your own Twitter account.)
This is a pretty tricky script, if I say so myself. Here you can see that we unwrap the XML sent by Twitter and use a complicated sequence of grep/sed/awk to turn it into two name=value pairs, instantiating msg and id.
When I run the script, I see:
Twitter user @TedWahler sent message That sounds like a very interesting article. When and where can I read "When Not To Identify your Group Memberships" Dave? Twitter user @naomimimi sent message i will send you some of my amazing restedness after sleeping for 20 hours yesterday. *bzzzt* feel better? :) Twitter user @GaryBloomer sent message RE: Song. Dave, don't know if you have an answer yet, but: Supertramp: If Everyone Was Listening
A tiny tweak can show who sends you tweets (these are actually @ replies, which is what makes this work): simply change the echo in the final loop to just echo $id.
Want to find those shortened URLs and compile a list? That's a tiny bit more tricky, but you can use tr and grep to do the heavy lifting:
$ sh tweet-listen.sh | tr ' ' '\ > ' | grep 'http://' http://twurl.nl/bco8tq http://twurl.nl/bco8tq http://bit.ly/12PvjV
Hey, someone must have retweeted or something for the same URL to show up twice!
What we want to do though is look for a specific pattern within the stream, so let's do that instead.
The easy way is to change the while read buffer loop to do the parsing:
while read buffer
do
eval $buffer
if [ "$msg" == "hours" ] ; then
echo "Twitter user @$id asked what our hours are"
elif [ "$msg" = "address" ] ; then
echo "Twitter user @$id asked for our address"
# else
# echo Twitter user @$id sent message $msg
fi
done < $temp
Armed with that (and with some cooperative Twitter pals), I can now run the script and find out that:
Twitter user @MommyBrain asked for our address Twitter user @lizhamilton asked what our hours are Twitter user @valdezign asked what our hours are Twitter user @bgindra asked what our hours are Twitter user @MommyBrain asked what our hours are
Coolness, eh? Now, let's answer.
From an earlier column “Pushing Your Message Out to Twitter” in the November 2008 issue of LJ (www.linuxjournal.com/article/10222), we have a script already lying around that lets you specify what message you'd like to send out on Twitter, so it's just a matter of assembling it properly:
while read buffer
do
eval $buffer
if [ "$msg" == "hours" ] ; then
echo "Twitter user @$id asked what our hours are"
$tweet "@$id our hours are Mon-Fri 9-5, Sat 10-4."
elif [ "$msg" = "address" ] ; then
echo "Twitter user @$id asked for our address"
$tweet "@$id we're at 123 University Avenue, Anywhere USA"
fi
done < $temp
In this instance, I'll repeat the earlier tweet script because it's both so succinct and so darn useful:
#!/bin/sh
# Twitter command line interface
user="DaveTaylor" ; pass='PasswordGoesHere'
curl="/usr/bin/curl"
$curl --silent --user "$user:$pass" --data-ascii \
"status=$(echo $@ | tr ' ' '+')" \
"http://twitter.com/statuses/update.json" > /dev/null
echo "(sent tweet $@)"
exit 0
The problem is a bit more complex than we've addressed so far, because when I asked people to send one-word queries, I also got things like “directions” and directions! rather than just the word by itself, unadorned by punctuation, quotation marks and so on.
This is something we'll need to deal with in the script, so we'll want to scrub the msg value to be just alphanumeric (or just alphabetic, if our set of canned response queries never includes a digit). This can be done with tr again, immediately after the eval $buffer statement:
msg="$(echo $msg | tr -cd '[:alpha:]')"
That's not quite right. When we get “directions”, it's actually with the quotes escaped by HTML so they're " rather than just the " symbol. The result? quotdirectionsquot. Not good.
Just like so much in the world of programming, things aren't as easy as you'd like them to be. Instead, we're going to have to strip out quotes manually as part of the scrubbing process. Now it looks like this:
msg="$(echo $msg | sed 's/\"//g' | tr -cd '[:alpha:]')"
It's a bit more complicated, but not terribly so.
The bigger issue is recognizing when we've already responded to a Twitter query to the bot. I'm sure no one's going to appreciate it if a query for “hours?” results in an answer every ten minutes for the next two weeks!
There are two ways to address that particular problem, one of which is to add timestamps to each tweet and figure out when we last auto-responded, but that sounds suspiciously like work. Instead, we simply can remember the most recent tweet to which we responded, including user ID, and use that as the starting point for subsequent auto-response parsing efforts.
I can't squeeze it in this month, but rest assured that next month we'll add this third piece and then talk about how to slip it into a cron job so that every N minutes our Twitter response bot answers any pending queries from the twitterverse.
Dave Taylor has been hacking shell scripts for over thirty years. Really. He's the author of the popular "Wicked Cool Shell Scripts" and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- New Products
- Linux Systems Administrator
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




3 hours 34 min ago
3 hours 50 min ago
5 hours 41 min ago
11 hours 33 min ago
16 hours 5 min ago
16 hours 6 min ago
18 hours 6 min ago
1 day 2 hours ago
1 day 3 hours ago
1 day 4 hours ago