Work the Shell - Listening to Your Twitter Stream
Last month wrapped up with a problem so complex we had to delve into a different programming language to create a solution to the mathematics of calculating the distance between two lat/lon points on the globe. My head's still spinning. I long ago graduated computer science, so what the heck?
This month, I thought we should move back to something a bit more fun and perhaps a bit less complicated (well, maybe not, we'll see) and return to Twitter.
What I've been thinking about is how helpful it would be to have a bot that listened to my Twitter stream and answered simple queries directly without human intervention. Stores could have a bot respond to queries like “hours?” and “address?”, and students could have their schedule preprogrammed, and the bot could answer queries like “class?” by indicating what class students were in at that moment.
In fact, there's a local startup here in Boulder, Colorado, that is moving down this path called Local Bunny (localbunny.com), but it's doing a real, fully thought-out solution. By comparison, I'm going to show you a bubblegum and bailing wire approach!
To track a Twitter stream from an individual, it's quite easy: a call to the right URL with curl does the trick:
curl http://twitter.com/status/user_timeline/davetaylor.xml
That'll give you my last dozen tweets or so, along with a lot of additional information, all in XML format.
What we want, however, are mentions of an account or pattern, which require you to supply login credentials. This call is a bit more complicated, but you still can accomplish it with curl:
curl -u "davetaylor:$pw" http://www.twitter.com/statuses/mentions.xml
Here, I've set pw to my account password (you don't really want to know my password, do you?). The output, however, is something else. For an individual tweet, there are 42 lines of information that come back (for a 140-character tweet).
It's too much to show you here, but try the command yourself and be astonished at the output.
To trim it down, let's use grep with a regular expression to extract the Twitter ID of the person who sent the Tweet that mentions @DaveTaylor, and the tweet itself:
<text>@DaveTaylor Have them send the money in gold bullion.</text> <screen_name>LenBailey</screen_name> <text>@DaveTaylor Escrow.com</text> <screen_name>Ed</screen_name>
You can see here that the first tweet is from @LenBailey, and the second from @Ed.
Turning this into coherent output is a tiny bit tricky, because we really want to merge line pairs into a single line that denotes message and ID. That's a job for awk:
awk '{if (NR % 2 == 1) { printf ("%s",$0) } else { print $0 }}'
Now, if we feed the curl output to this, we'll see:
<text>@DaveTaylor Have them send the money in gold bullion.</text> <screen_name>LenBailey</screen_name> <text>@DaveTaylor Escrow.com</text> <screen_name>Ed</screen_name>
Next step: let's get rid of the XML artifacts and reformat it to be a bit easier to parse. We also can axe @DaveTaylor, because we know it's to this account already (in the actual code, it's one invocation, but here it's easier to show it in two lines for legibility):
sed 's/@DaveTaylor //;s/<text>//;s/<\/text>//' | sed 's/ <screen_name>/ == /;s/<\/screen_name>//' www.xetrade.com ? == kiasuchick Have them send the money in gold bullion. == LenBailey Escrow.com == Ed
That's more like it!
Let's start by doing something simple. If you “@” my Twitter account with the command date, it'll detect it, actually run the date command, and send out the results on my behalf.
To do this, we'll want to split the data stream into “tweet” and “tweeter”, but we can do this in a tricky way by tweaking the earlier awk string to create name=value pairs:
awk '{if (NR % 2 == 1) { printf ("msg=\"%s\"; ",$0) }
↪else { print "id="$0 }}'
The result:
msg="escrow"; id=Stepan msg="www.xetrade.com ?"; id=kiasuchick msg=" Have them send the money in gold bullion. "; id=LenBailey msg="Escrow.com"; id=Ed
Nice. Now we can use the underutilized eval command in the growing script to set the variables msg and id to the two, and then check msg for known values. Now, if you're sharp, you'll realize tweets that include double quotes are a bit of a problem, but fortunately, the Twitter API is smart too. All single quotes pass through as is, but double quotes are rewritten as the HTML entity ".
Let's pause for a second so I can show you what I've built so far:
$curl -u "davetaylor:$pw" $inurl | \
grep -E '(<screen_name>|<text>)' | \
sed 's/@DaveTaylor //;s/ <text>//;s/<\/text>//' | \
sed 's/ <screen_name>//;s/<\/screen_name>//' | \
awk '{if (NR % 2 == 1) { printf ("msg=\"%s\"; ",$0) }
↪else { print "id="$0 }}' >
$temp
That grabs the 20 most-recent tweets for the specified user and converts them into msg="message" and id=userid for each one. Fed to eval in a loop, we now have a very easy way to parse things:
while read buffer do eval $buffer echo Twitter user @$id sent message $msg done < $temp
Let's wrap up the column here for now, but next month, we'll take the next step and actually parse the Twitter “@” messages being sent to me, trying to find those that match the predefined queries we've set, act upon them and respond.
This is going to be a pretty cool project when we're done!
Dave Taylor has been involved with UNIX since he first logged in to the on-line network in 1980. That means that, yes, he's coming up to the 30-year mark now. You can find him just about everywhere on-line, but start here: www.DaveTaylorOnline.com. In addition to all his other projects, Dave is a film critic for a number of local publications. You can read his reviews at www.DaveOnFilm.com.
Dave Taylor has been hacking shell scripts for over thirty years. Really. He's the author of the popular "Wicked Cool Shell Scripts" and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- A Topic for Discussion - Open Source Feature-Richness?
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Home, My Backup Data Center
- What's the tweeting protocol?
- New Products
- Readers' Choice Awards
- RSS Feeds
- Dart: a New Web Programming Experience
- Reply to comment | Linux Journal
12 hours 12 min ago - Reply to comment | Linux Journal
14 hours 45 min ago - Reply to comment | Linux Journal
16 hours 2 min ago - great post
16 hours 37 min ago - Google Docs
17 hours 8 sec ago - Reply to comment | Linux Journal
21 hours 48 min ago - Reply to comment | Linux Journal
22 hours 35 min ago - Web Hosting IQ
1 day 9 min ago - Thanks for taking the time to
1 day 1 hour ago - Linux is good
1 day 3 hours ago
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.




Comments
Argh! OAuth?
Ok, I didn't have a chance to try this when you wrote this Dave, but I'm going to give it a shot, and read and hack out this whole article... I new to trying to do things in the shell. And I'm willing to make a spectacle of myself on twitter doing so!
BOINK. I ran into an Authentication error. I forgot that OAuth is now being used by Twitter. Is there a way to get past this?
---
Randy