Work the Shell - Still Parsing the Twitter Stream
Last month, you'll hopefully remember that we took the big step in our Twitter stream parsing program of actually having it parse the incoming messages and strip out quotes and other HTML noise. I also republished the send-tweet script too, which we'll use this month.
The biggest challenge we face with the tweet-parser is knowing what messages we've already answered and which are new since the last time the program was run. The solution? To go back and tweak the original script a bit. It turns out that each and every tweet has a unique ID value, as you can see here:
<id>2541771</id>
You'll recall that early in the script we have this grep command:
grep -E '(<screen_name>|<text>)' | \
Simple enough. We'll tweak it to include |<id> and grab that value too. Except, of course, it's not that simple. It turns out that two <id> strings show up in the XML data from Twitter: one that's the ID of the account sending the message, and another that's the ID of the message itself—both conveniently labeled the same. Ugh!
I can kvetch and wish Twitter would fix its XML to have USERID or similar, but what's the point? They have the same thing with the overloaded <created_at> tag too, so we're going to have to bite the bullet and accept that we are now grabbing four data fields from the XML feed, only three of which we care about.
Once we know that we're going to have four lines of output, cyclically, we simply can decide which of those are actually important and tweak them in the awk statement:
$curl -u "davetaylor:$pw" $inurl | \
grep -E '(<screen_name>|<text>|<id>)' | \
sed 's/@DaveTaylor //;s/ <text>//;s/<\/text>//' | \
sed 's/ *<screen_name>//;s/<\/screen_name>//' | \
sed 's/ *<id>//;s/<\/id>//' | \
awk '{ if (NR % 4 == 0) {
printf ("name=%s; ", $0) }
else if (NR % 4 == 1) {
printf("id=%s; ",$0) }
else if (NR % 4 == 2) {
print "msg=\"" $0 "\"" }
}' > $temp
That's a pretty complicated sequence, so let's look at the awk conditional statement a little closer. We have four input records (lines) that we're stepping through. The value of NR is the number of records processed so far. So if NR mod 4 equals 0, it's the first of the four records (lines). The first record is the name value.
Did you see that two lines have printf, and the third uses a simpler print statement? Since we want each set of variables on a separate line, we use the print statement, because it automatically appends a newline to the output. Of course, the same effect could be achieved by putting the newline as a format string passed to printf. Example output follows:
name=thattalldude; id=6507045947; msg="Rates?" name=KateC; id=6507034680; msg="hours" name=pbarbanes; id=6507033698; msg="thanks" name=jodie_nodes; id=6507022063; msg=" $$?" name=KateC; id=6507019757; msg="price" name=tarahn; id=6507008559; msg="impact" name=GaryH2UK; id=6507004771; msg="directions"
We're going to hand these again, line by line, to the eval statement to set the three variables: name, id and msg. Then, it's a simple parsing problem, comparing msg to the known queries we have. Basically, it's what we did last month, except this time, every single tweet also has a unique ID value associated with it.
A typical test might now look like this:
if [ "$msg" == "hours" ] ; then echo "@$name asked what our hours are in tweet $id" fi
Nice! It's simple, straightforward and well worth the preprocessing hoops we've jumped through.
Indeed, I run that against my Twitter stream (after asking people to send me sample queries), and here's what I see:
@TheNose100 asked what our hours are in tweet 6507436100 @crepeauf asked what our hours are in tweet 6507187325 @jdscott asked what our hours are in tweet 6507087136 @KateC asked what our hours are in tweet 6507034680 @inspiremetoday asked what our hours are in tweet 6506966654
I bet you can see how to proceed from here. We write static responses, calculate values as needed and use send-tweet to respond to the user:
$tweet "@$name our hours are Mon-Fri 9-5, Sat 10-4."
For fun, I'll let people send the query “time” and get the current output of the date command too, just to demonstrate how that might work. Here's the code block:
if [ "$msg" == "time" ] ; then echo "@$id asked for the time" $tweet "@$name the local time on our server is $(date)" fi
Great. Got it all, except for where we started out. How do you track which tweets you've already answered?
Dave Taylor has been hacking shell scripts for over thirty years. Really. He's the author of the popular "Wicked Cool Shell Scripts" and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Dynamic DNS—an Object Lesson in Problem Solving
- Designing Electronics with Linux
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- Web & UI Developer (JavaScript & j Query)
- Senior Perl Developer
- Technical Support Rep
- Symbolic Math with Python
- UX Designer
- Network Monitoring with Ethereal
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




3 min 15 sec ago
1 hour 56 min ago
3 hours 49 min ago
10 hours 43 min ago
10 hours 59 min ago
12 hours 51 min ago
18 hours 43 min ago
23 hours 14 min ago
23 hours 15 min ago
1 day 1 hour ago