Work the Shell - Still Parsing the Twitter Stream
Last month, you'll hopefully remember that we took the big step in our Twitter stream parsing program of actually having it parse the incoming messages and strip out quotes and other HTML noise. I also republished the send-tweet script too, which we'll use this month.
The biggest challenge we face with the tweet-parser is knowing what messages we've already answered and which are new since the last time the program was run. The solution? To go back and tweak the original script a bit. It turns out that each and every tweet has a unique ID value, as you can see here:
<id>2541771</id>
You'll recall that early in the script we have this grep command:
grep -E '(<screen_name>|<text>)' | \
Simple enough. We'll tweak it to include |<id> and grab that value too. Except, of course, it's not that simple. It turns out that two <id> strings show up in the XML data from Twitter: one that's the ID of the account sending the message, and another that's the ID of the message itself—both conveniently labeled the same. Ugh!
I can kvetch and wish Twitter would fix its XML to have USERID or similar, but what's the point? They have the same thing with the overloaded <created_at> tag too, so we're going to have to bite the bullet and accept that we are now grabbing four data fields from the XML feed, only three of which we care about.
Once we know that we're going to have four lines of output, cyclically, we simply can decide which of those are actually important and tweak them in the awk statement:
$curl -u "davetaylor:$pw" $inurl | \
grep -E '(<screen_name>|<text>|<id>)' | \
sed 's/@DaveTaylor //;s/ <text>//;s/<\/text>//' | \
sed 's/ *<screen_name>//;s/<\/screen_name>//' | \
sed 's/ *<id>//;s/<\/id>//' | \
awk '{ if (NR % 4 == 0) {
printf ("name=%s; ", $0) }
else if (NR % 4 == 1) {
printf("id=%s; ",$0) }
else if (NR % 4 == 2) {
print "msg=\"" $0 "\"" }
}' > $temp
That's a pretty complicated sequence, so let's look at the awk conditional statement a little closer. We have four input records (lines) that we're stepping through. The value of NR is the number of records processed so far. So if NR mod 4 equals 0, it's the first of the four records (lines). The first record is the name value.
Did you see that two lines have printf, and the third uses a simpler print statement? Since we want each set of variables on a separate line, we use the print statement, because it automatically appends a newline to the output. Of course, the same effect could be achieved by putting the newline as a format string passed to printf. Example output follows:
name=thattalldude; id=6507045947; msg="Rates?" name=KateC; id=6507034680; msg="hours" name=pbarbanes; id=6507033698; msg="thanks" name=jodie_nodes; id=6507022063; msg=" $$?" name=KateC; id=6507019757; msg="price" name=tarahn; id=6507008559; msg="impact" name=GaryH2UK; id=6507004771; msg="directions"
We're going to hand these again, line by line, to the eval statement to set the three variables: name, id and msg. Then, it's a simple parsing problem, comparing msg to the known queries we have. Basically, it's what we did last month, except this time, every single tweet also has a unique ID value associated with it.
A typical test might now look like this:
if [ "$msg" == "hours" ] ; then echo "@$name asked what our hours are in tweet $id" fi
Nice! It's simple, straightforward and well worth the preprocessing hoops we've jumped through.
Indeed, I run that against my Twitter stream (after asking people to send me sample queries), and here's what I see:
@TheNose100 asked what our hours are in tweet 6507436100 @crepeauf asked what our hours are in tweet 6507187325 @jdscott asked what our hours are in tweet 6507087136 @KateC asked what our hours are in tweet 6507034680 @inspiremetoday asked what our hours are in tweet 6506966654
I bet you can see how to proceed from here. We write static responses, calculate values as needed and use send-tweet to respond to the user:
$tweet "@$name our hours are Mon-Fri 9-5, Sat 10-4."
For fun, I'll let people send the query “time” and get the current output of the date command too, just to demonstrate how that might work. Here's the code block:
if [ "$msg" == "time" ] ; then echo "@$id asked for the time" $tweet "@$name the local time on our server is $(date)" fi
Great. Got it all, except for where we started out. How do you track which tweets you've already answered?
Dave Taylor has been hacking shell scripts for over thirty years. Really. He's the author of the popular "Wicked Cool Shell Scripts" and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- RSS Feeds
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- Developer Poll
- Dart: a New Web Programming Experience
- Readers' Choice Awards
- What's the tweeting protocol?
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.




1 hour 8 min ago
1 hour 26 min ago
1 hour 56 min ago
1 hour 56 min ago
1 hour 57 min ago
4 hours 57 min ago
13 hours 24 min ago
13 hours 29 min ago
13 hours 59 min ago
17 hours 13 min ago