At the Forge - Aggregating with Atom

Want to give everyone a polite reminder when you have new content on your Web site? Give your site the latest syndication standard and you'll have a new tool to keep visitors coming back.
Parsing an Atom Feed

To parse an Atom feed, either because we are writing an aggregator or because we want to create an Atom-powered application, we have several options. The easiest way is to continue to use XML::Atom::Feed to discover and retrieve feeds, for example:

#!/usr/bin/perl

use strict;
use diagnostics;
use warnings;

use XML::Atom::Feed;

# Get the Atom feeds for www.diveintomark.org
my @uris =
    XML::Atom::Feed->find_feeds(
        "http://www.diveintomark.org/");

    # Print each Atom feed URI
    foreach my $uri (@uris)
    {
    print "uri = '$uri'\n";
    }

In the above example, we see a single URI printed. Now that we know where the feed is, we can get a list of links in it, turning those links into XML:

#!/usr/bin/perl

use strict;
use diagnostics;
use warnings;

use XML::Atom::Feed;

# Get an Atom feed
my @uris = XML::Atom::Feed->find_feeds("http://www.diveintomark.org/");

foreach my $uri (@uris)
{
my $feed = XML::Atom::Feed->new(URI->new($uri));

my @links = $feed->link();

foreach my $link (@links)
{
    my $link_xml = $link->as_xml();
    print "link = '$link_xml\n";
}
}

Of course, we don't have to produce or display XML; we can parse the link information, sending new links to subscribers by e-mail, adding them to a database or ignoring those that fail to meet certain criteria.

Because Atom feeds are so regular, and because they operate using Internet standards such as XML, Unicode and MIME, we can be confident that the content our feed parses can be handled in straightforward ways. We can farm out different content types to different handlers, parse them in different ways and even (as in the newspaper example above) place them onto new feeds, becoming a super-aggregator.

If you are interested in creating an aggregator or in understanding how to work with the different myriad versions of RSS and Atom, it also is worth looking at Mark Pilgrim's feed aggregator. Written in Python and constantly updated, this is probably the best-documented piece of open-source engine for working with syndication feeds.

RSS or Atom?

So, should your Web site (or Weblog) provide syndication feeds in RSS, in Atom or in both? It is clear to me that Atom is the best of the two (or three) syndication format families produced to date. Dave Winer's RSS formats were groundbreaking when they were released, but they have too many problems to form the basis of full-fledged, enterprise-ready standards. We have seen the agony that results from half-baked standards, such as early versions of HTML and JavaScript, and given that syndication stands a good chance of becoming an important communication mechanism, completeness and unambiguity are important factors to consider.

It is similarly important to consider the growing international use of the Internet and that people want to syndicate media other than text. Atom's lack of ambiguity regarding special characters is another big step forward, ensuring that we can include < and > in our Weblog entries without having to worry about the implications for syndication. Most important, the planned provisions for extensions will make it possible for Atom to meet the needs of specific groups and applications without opening the entire specification anew.

Although Atom is remarkably complete, it is also straightforward to use. A great deal of time and energy clearly have been put into making Atom as easy to use as possible. Creating a new API is not a simple task, particularly when it is meant to be as general as possible.

Finally, the mess of RSS version numbers that resulted in (and from) petty and political arguments has served no one very well. Because Atom has a different name, although literally an issue of semantics, it reduces the confusion that developers and users alike face when working with RSS.

Conclusion

Atom is an attempt to solve many of the problems associated with RSS and to turn syndication into a building block for new types of high-level communication across Internet applications. Atom is slightly more complicated than Dave Winer's versions of RSS, but it is less complicated (in its initial version) than RSS 1.0, which used RDF to describe and summarize Web sites. The combination of easy-to-use software tools for working with Atom feeds, its extensibility and the authors' commitment to being a part of the Internet standards community, makes it clear that Atom will play a key role in the future of Web communication.

Resources for this article: /article/7751.

Reuven M. Lerner, a longtime Web/database consultant and developer, now is a graduate student in the Learning Sciences program at Northwestern University. His Weblog is at altneuland.lerner.co.il, and you can reach him at reuven@lerner.co.il.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

ys

Anonymous's picture

tnks

The server firewall allows

Tahha's picture

The server firewall allows incoming SSH traffic from anywhere. It then performs IP address filtering to allow only evden eve nakliyat certain IP addresses access to more open ankara evden evenakliyat resources, such as NFS, LDAP, CUPS and the FlexLM license server. The Web server uses a slightly different setup to allow only incoming SSH and HTTP traffic.

Atom feed does not validate

Harish's picture

Hi Reuven,

I regularly read your articles,they are informative and I put them to use at work!

I not able to get the atom feed validated (feedvalidator.org) using exactly the information given in your aticle for creating a feed. I tried passing your feed through the validator and it would not validate!,The validator complains about missing version, author etc. Can you please guide me on this.

Good day
Harish

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState