At the Forge - Aggregating with Atom

Want to give everyone a polite reminder when you have new content on your Web site? Give your site the latest syndication standard and you'll have a new tool to keep visitors coming back.
Parsing an Atom Feed

To parse an Atom feed, either because we are writing an aggregator or because we want to create an Atom-powered application, we have several options. The easiest way is to continue to use XML::Atom::Feed to discover and retrieve feeds, for example:

#!/usr/bin/perl

use strict;
use diagnostics;
use warnings;

use XML::Atom::Feed;

# Get the Atom feeds for www.diveintomark.org
my @uris =
    XML::Atom::Feed->find_feeds(
        "http://www.diveintomark.org/");

    # Print each Atom feed URI
    foreach my $uri (@uris)
    {
    print "uri = '$uri'\n";
    }

In the above example, we see a single URI printed. Now that we know where the feed is, we can get a list of links in it, turning those links into XML:

#!/usr/bin/perl

use strict;
use diagnostics;
use warnings;

use XML::Atom::Feed;

# Get an Atom feed
my @uris = XML::Atom::Feed->find_feeds("http://www.diveintomark.org/");

foreach my $uri (@uris)
{
my $feed = XML::Atom::Feed->new(URI->new($uri));

my @links = $feed->link();

foreach my $link (@links)
{
    my $link_xml = $link->as_xml();
    print "link = '$link_xml\n";
}
}

Of course, we don't have to produce or display XML; we can parse the link information, sending new links to subscribers by e-mail, adding them to a database or ignoring those that fail to meet certain criteria.

Because Atom feeds are so regular, and because they operate using Internet standards such as XML, Unicode and MIME, we can be confident that the content our feed parses can be handled in straightforward ways. We can farm out different content types to different handlers, parse them in different ways and even (as in the newspaper example above) place them onto new feeds, becoming a super-aggregator.

If you are interested in creating an aggregator or in understanding how to work with the different myriad versions of RSS and Atom, it also is worth looking at Mark Pilgrim's feed aggregator. Written in Python and constantly updated, this is probably the best-documented piece of open-source engine for working with syndication feeds.

RSS or Atom?

So, should your Web site (or Weblog) provide syndication feeds in RSS, in Atom or in both? It is clear to me that Atom is the best of the two (or three) syndication format families produced to date. Dave Winer's RSS formats were groundbreaking when they were released, but they have too many problems to form the basis of full-fledged, enterprise-ready standards. We have seen the agony that results from half-baked standards, such as early versions of HTML and JavaScript, and given that syndication stands a good chance of becoming an important communication mechanism, completeness and unambiguity are important factors to consider.

It is similarly important to consider the growing international use of the Internet and that people want to syndicate media other than text. Atom's lack of ambiguity regarding special characters is another big step forward, ensuring that we can include < and > in our Weblog entries without having to worry about the implications for syndication. Most important, the planned provisions for extensions will make it possible for Atom to meet the needs of specific groups and applications without opening the entire specification anew.

Although Atom is remarkably complete, it is also straightforward to use. A great deal of time and energy clearly have been put into making Atom as easy to use as possible. Creating a new API is not a simple task, particularly when it is meant to be as general as possible.

Finally, the mess of RSS version numbers that resulted in (and from) petty and political arguments has served no one very well. Because Atom has a different name, although literally an issue of semantics, it reduces the confusion that developers and users alike face when working with RSS.

Conclusion

Atom is an attempt to solve many of the problems associated with RSS and to turn syndication into a building block for new types of high-level communication across Internet applications. Atom is slightly more complicated than Dave Winer's versions of RSS, but it is less complicated (in its initial version) than RSS 1.0, which used RDF to describe and summarize Web sites. The combination of easy-to-use software tools for working with Atom feeds, its extensibility and the authors' commitment to being a part of the Internet standards community, makes it clear that Atom will play a key role in the future of Web communication.

Resources for this article: /article/7751.

Reuven M. Lerner, a longtime Web/database consultant and developer, now is a graduate student in the Learning Sciences program at Northwestern University. His Weblog is at altneuland.lerner.co.il, and you can reach him at reuven@lerner.co.il.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

ys

Anonymous's picture

tnks

The server firewall allows

Tahha's picture

The server firewall allows incoming SSH traffic from anywhere. It then performs IP address filtering to allow only evden eve nakliyat certain IP addresses access to more open ankara evden evenakliyat resources, such as NFS, LDAP, CUPS and the FlexLM license server. The Web server uses a slightly different setup to allow only incoming SSH and HTTP traffic.

Atom feed does not validate

Harish's picture

Hi Reuven,

I regularly read your articles,they are informative and I put them to use at work!

I not able to get the atom feed validated (feedvalidator.org) using exactly the information given in your aticle for creating a feed. I tried passing your feed through the validator and it would not validate!,The validator complains about missing version, author etc. Can you please guide me on this.

Good day
Harish

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix