At the Forge - Aggregating Syndication Feeds

 in
So far, we have looked at ways in which people might create RSS and Atom feeds for a Web site. Of course, creating syndication feeds is only one half of the equation. Equally as important and perhaps even more useful is understanding how we can retrieve and use syndication feeds, both from our own sites and from other sites of interest.

As you can see from looking at the code listing, creating such a news aggregator for personal use is fairly simple and straightforward. This is merely a skeletal application, however. To be more useful in the real world, we probably would want to move feeds.txt and myfeeds.html into a relational database, determine the feed URL automatically or semi-automatically based on a site URL and handle categories of feeds, so that multiple feeds can be read as if they were one.

If the above description sounds familiar, then you might be a user of Bloglines.com, a Web-based blog aggregator that probably works in the above way. Obviously, Bloglines handles many more feeds and many more users than we had in this simple toy example. But, if you are interested in creating an internal version of Bloglines for your organization, the combination of the Universal Feed Parser with a relational database, such as PostgreSQL, and some personalization code is both easy to implement and quite useful.

Conclusion

The tendency to reinvent the wheel often is cited as a widespread problem in the computer industry. Mark Pilgrim's Universal Feed Parser might fill only a small need in the world of software, but that need is almost certain to grow as the use of syndication increases for individuals and organizations alike. The bottom line is if you are interested in reading and parsing syndication feeds, you should be using feedparser. It is heavily tested and documented, often updated and improved and it does its job quickly and well.

Reuven M. Lerner, a longtime Web/database consultant and developer, now is a graduate student in the Learning Sciences program at Northwestern University. His Weblog is at altneuland.lerner.co.il, and you can reach him at reuven@lerner.co.il.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

a small semantic error in aggregator.py

zied's picture

Hi,
In aggregator.py, instead of the feed's title there's the first feed title :
aggregation_file.write('%s\n' % \
feed.entries[0].title)

I would suggest you this :
aggregation_file.write('%s\n' % \
feed.channel.title)

bye

Share what you learn what you don't

Install error

midijery's picture

I came up with an error also. I'm running SUSE 9.1 and on installing as per instructions came up with anerror:
No module named distutils.core
Ive been trying to work with Linux for many years and it's getting much more user freindly, but coming up with errors like this only lead to frustration.

Not so simple install

maskedfrog's picture

I can't speak for other distro's but on Mandrake 10.1 and likely previous versions libpython2.x-devel must be installed not just python.

Installing feedparser is extremely simple. Download the latest version, move into its distribution directory and type
python setup.py install.
This activates Python's standard installation utility, placing the feedparser in your Python site-packages directory. Once you have done installed feedparser, you can test it using Python interactively, from a shell window:

This will quickly result in feedback of:

error: invalid Python installation: unable to open
/usr/lib/python2.3/config/Makefile (No such file or directory)

or similar unless libpythonX.x-devel is installed.
Apparently this applies to RedHat fedora also.

Other than that, haven't checked into the code sample from the first reply, this is a fine article that I hope will get me started on my own personal aggregator so I can replace Knewsticker with a robust and site friendly aggregator. And not get banned at /. again (-:

Download link, and example code typo

nathanst's picture

The article doesn't seem to actually say where feedparser can be downloaded from (and there is no "resources" link for this article). Presumably this is the site in question:
http://www.feedparser.org/

Also, in the How New Is that News? section, it looks like the code snippet is actually missing the "modified" parameter in the function call. I think those lines should be:


last_retrieval = (2004, 9, 1, 0, 0, 0, 0, 0, 0)
ljfeed = feedparser.parse("http://www.linuxjournal.com/news.rss",
              modified=last_retrieval )

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState