At the Forge - Aggregating Syndication Feeds

 in
So far, we have looked at ways in which people might create RSS and Atom feeds for a Web site. Of course, creating syndication feeds is only one half of the equation. Equally as important and perhaps even more useful is understanding how we can retrieve and use syndication feeds, both from our own sites and from other sites of interest.

As you can see from looking at the code listing, creating such a news aggregator for personal use is fairly simple and straightforward. This is merely a skeletal application, however. To be more useful in the real world, we probably would want to move feeds.txt and myfeeds.html into a relational database, determine the feed URL automatically or semi-automatically based on a site URL and handle categories of feeds, so that multiple feeds can be read as if they were one.

If the above description sounds familiar, then you might be a user of Bloglines.com, a Web-based blog aggregator that probably works in the above way. Obviously, Bloglines handles many more feeds and many more users than we had in this simple toy example. But, if you are interested in creating an internal version of Bloglines for your organization, the combination of the Universal Feed Parser with a relational database, such as PostgreSQL, and some personalization code is both easy to implement and quite useful.

Conclusion

The tendency to reinvent the wheel often is cited as a widespread problem in the computer industry. Mark Pilgrim's Universal Feed Parser might fill only a small need in the world of software, but that need is almost certain to grow as the use of syndication increases for individuals and organizations alike. The bottom line is if you are interested in reading and parsing syndication feeds, you should be using feedparser. It is heavily tested and documented, often updated and improved and it does its job quickly and well.

Reuven M. Lerner, a longtime Web/database consultant and developer, now is a graduate student in the Learning Sciences program at Northwestern University. His Weblog is at altneuland.lerner.co.il, and you can reach him at reuven@lerner.co.il.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

a small semantic error in aggregator.py

zied's picture

Hi,
In aggregator.py, instead of the feed's title there's the first feed title :
aggregation_file.write('%s\n' % \
feed.entries[0].title)

I would suggest you this :
aggregation_file.write('%s\n' % \
feed.channel.title)

bye

Share what you learn what you don't

Install error

midijery's picture

I came up with an error also. I'm running SUSE 9.1 and on installing as per instructions came up with anerror:
No module named distutils.core
Ive been trying to work with Linux for many years and it's getting much more user freindly, but coming up with errors like this only lead to frustration.

Not so simple install

maskedfrog's picture

I can't speak for other distro's but on Mandrake 10.1 and likely previous versions libpython2.x-devel must be installed not just python.

Installing feedparser is extremely simple. Download the latest version, move into its distribution directory and type
python setup.py install.
This activates Python's standard installation utility, placing the feedparser in your Python site-packages directory. Once you have done installed feedparser, you can test it using Python interactively, from a shell window:

This will quickly result in feedback of:

error: invalid Python installation: unable to open
/usr/lib/python2.3/config/Makefile (No such file or directory)

or similar unless libpythonX.x-devel is installed.
Apparently this applies to RedHat fedora also.

Other than that, haven't checked into the code sample from the first reply, this is a fine article that I hope will get me started on my own personal aggregator so I can replace Knewsticker with a robust and site friendly aggregator. And not get banned at /. again (-:

Download link, and example code typo

nathanst's picture

The article doesn't seem to actually say where feedparser can be downloaded from (and there is no "resources" link for this article). Presumably this is the site in question:
http://www.feedparser.org/

Also, in the How New Is that News? section, it looks like the code snippet is actually missing the "modified" parameter in the function call. I think those lines should be:


last_retrieval = (2004, 9, 1, 0, 0, 0, 0, 0, 0)
ljfeed = feedparser.parse("http://www.linuxjournal.com/news.rss",
              modified=last_retrieval )

Geek Guide
The DevOps Toolbox

Tools and Technologies for Scale and Reliability
by Linux Journal Editor Bill Childers

Get your free copy today

Sponsored by IBM

Webcast
8 Signs You're Beyond Cron

Scheduling Crontabs With an Enterprise Scheduler
On Demand
Moderated by Linux Journal Contributor Mike Diehl

Sign up and watch now

Sponsored by Skybot