At the Forge - Bloglines Web Services, Continued
I am writing this column a few days after the November 2, 2004, elections in the United States. As an admitted political junkie, I enjoy the modern era of computerized, always-on punditry. No longer must I switch TV stations or read several newspapers at the local library; now, I can follow the sound bites as they pass from the candidates to the press to the various partisan sites.
Keeping up with many different news and opinion sites can consume quite a bit of time. As we have seen over the last few months, everyone has benefited from the creation of news aggregators—programs that read the RSS and Atom syndication feeds produced by Weblogs, newspapers and other frequently updated sites. An aggregator, as its name suggests, takes these feeds and puts them into a single, easily accessible listing.
Bloglines.com is an Internet startup that provides a Web-based news aggregator. In and of itself, this should not surprise anyone; the combination of syndication, aggregation and the Web made this a natural idea. And, Bloglines isn't unique; there are other, perhaps lesser-known, Web-based news aggregators.
One unique service that Bloglines offers its subscribers, however, is the ability to use Bloglines' internal database to create their own news aggregators or their own applications built from the data Bloglines has collected. This information is available without charge, under a fairly unrestrictive license, to any programmer interested in harvesting the results of Bloglines' engine. The fact that Bloglines checks for updates on hundreds of thousands of blogs and sites approximately every hour means that someone using the Web services API can be assured of getting the most recent Weblog content.
Last time [LJ, January 2005], we looked at the Notifier API, which provides access to a particular user's available-but-unread feeds. We also discussed the Blogroll API, which allows users to determine and use programmatically, if they wish, a list of people who are pointing to a feed. As we saw, these APIs made it easy for us to find out that new Weblog entries were available or to create our own custom aggregation page listing Weblogs of interest.
Something was missing in the functionality that we exposed in that article, however. It's nice to know that new Weblog entries are among my Bloglines subscriptions, but it would be even nicer to know which blogs had been updated. And, it's nice to get a list of my current subscriptions, but I would be much happier to find out which of them have been updated—and to find out when they were most recently updated, how many new entries are in each Weblog and what those entries contain. In other words, I want to be able to replace the current Bloglines interface with one of my own, displaying new Weblog entries in a format that isn't dictated by the Bloglines.com Web site.
Luckily, the Web services developers at Bloglines have made it possible to do exactly this by way of the sync API. This month, we continue our exploration of Bloglines Web services, looking in detail at the sync API it provides. We also are going to create a simple news aggregator of our own, providing some of the same features as the Bloglines interface.
At the end of the day, a news aggregator such as Bloglines simply is a list of URLs. Indeed, the Python-based news aggregator we created two months ago using the Universal Feed Parser was precisely such a program—it looked at a set of URLs in a file and retrieved the most recent items associated with those URLs. Each individual Weblog posting must be associated with one of the URLs on a list. Removing a URL from the subscription lists makes its associated postings irrelevant to that user and invisible to them.
The fact that Bloglines has multiple users rather than a single user means it must keep track of not only a set of different URLs, but also which URL is associated with each user. Although this obviously complicates things somewhat, modern high-level languages make the difference between these two data structures easily understood. Rather than simply storing a list of URLs, we must create a hash table, in which the key is a user ID and the value is the list associated with that particular user. Once we have the user's unique ID, we easily can keep track of that particular user's subscriptions.
Of course, Bloglines is keeping track of subscriptions not for a few thousand users, but for many tens or hundreds of thousands of users. Thus, it is safe to assume they are not using such a naive implementation, which would suffice for a small experiment or an aggregator designed for a small number of people. Things get a bit trickier when you approach Bloglines' user load. Each user's list of subscriptions can't be a simple URL; it is more likely to be an ID number (or primary key, in database jargon) associated with a URL. Such a system gives multiple participants the chance to subscribe to a site's syndication feed and allows Bloglines to suggest new Weblogs that they might enjoy, based on their current subscriptions.
It thus should come as no surprise to learn that retrieving new Weblog postings from Bloglines is a two-step process, with the first step requiring us to retrieve a list of subscriptions. That is, we first ask Bloglines for a list of subscription IDs associated with a user. We then ask Bloglines to send us all of the new items for this user and this subscription ID.
Implementations of the Bloglines Web services API are available in several different languages. Because Perl is my default language for creating new applications, I am going to use the WebService::Bloglines module that has been uploaded to CPAN, the Comprehensive Perl Archive Network, a worldwide collection of Web and FTP servers from which Perl and its modules can be retrieved. For example, Listing 1 contains a simple program (bloglines-listsubs.pl) that displays the title, subscription ID and URL for each of a user's subscriptions. A number of additional values are available for each of the subscriptions; the documentation for WebService::Bloglines, as well as the Bloglines API documentation, lists these in detail.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- SUSE LLC's SUSE Manager
- My +1 Sword of Productivity
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- Non-Linux FOSS: Caffeine!
- Managing Linux Using Puppet
- Doing for User Space What We Did for Kernel Space
- SuperTuxKart 0.9.2 Released
- Parsing an RSS News Feed with a Bash Script
- Tech Tip: Really Simple HTTP Server with Python
- Rogue Wave Software's Zend Server
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide