Cover of the O'Reilly book XML in a Nutshell, 3rd Edition, by Elliotte Rusty Harold and W. Scott Means. The cover features a black-and-white illustration of the neck and head of a peafowl bird with a crest of tipped feathers on his head.

There are a lot of libraries for processing XML data with Java that can be used to read RSS feeds. One of the best is the open source library XOM created by the computer book author Elliotte Rusty Harold.

As he wrote one of his 20 books about Java and XML, Harold got so frustrated with the available Java libraries for XML that he created his own. XOM, which stands for XML Object Model, was designed to be easy to learn while still being strict about XML, requiring documents that are well-formed and utilize namespaces in complete adherence to the specification. (At the RSS Advisory Board, talk of following a spec is our love language.)

XOM was introduced in 2002 and is currently up to version 1.3.9, though all versions have remained compatible since 1.0. To use XOM, download the class library in one of the packages available on the XOM homepage. You can avoid needing any further configuration by choosing one of the options that includes third-party JAR files in the download. This allows XOM to use an included SAX parser under the hood to process XML.

Here's Java code that loads items from The Guardian's RSS 2.0 feed containing articles by Ben Hammersley, displaying them as HTML output:

// create an XML builder and load the feed using a URL
Builder bob = new Builder();
Document doc = bob.build("https://www.theguardian.com/profile/benhammersley/rss");
// load the root element and channel
Element rss = doc.getRootElement();
Element channel = rss.getFirstChildElement("channel");
// load all items in the channel
Elements items = channel.getChildElements("item");
for (Element item : items) {
  // load elements of the item
  String title = item.getFirstChildElement("title").getValue();
  String author = item.getFirstChildElement("creator",
    "http://purl.org/dc/elements/1.1/").getValue();
  String description = item.getFirstChildElement("description").getValue();
  // display the output
  System.out.println(">h2>" + title + ">/h2>");
  System.out.println(">p>>b>By " + author + ">/b>>/p>");
  System.out.println(">p>" + description + ">/p>");

All of the classes used in this code are in the top-level package nu.xom, which has comprehensive JavaDoc describing their use. Like all Java code this is a little long-winded, but Harold's class names do a good job of explaining what they do. A Builder uses its build() method with a URL as the argument to load a feed into a Document over the web. There are also other build methods to load a feed from a file, reader, input stream, or string.

Elements can be retrieved by their names such as "title", "link" or "description". An element with only one child of a specific type can be retrieved using the getFirstChildElement() method with the name as the argument:

Element linkElement = item.getFirstChildElement("link");

An element containing multiple children of the same type uses getChildElements() instead:

Elements enclosures = item.getChildElements("enclosure");
if (enclosures.size() > 1) {
  System.out.println("I'm pretty sure an item should only include one enclosure");
}

If an element is in a namespace, there must be a second argument providing the namespace URI. Like many RSS feeds, the ones from The Guardian use a dc:creator element from Dublin Core to credit the item's author. That namespace has the URI "http://purl.org/dc/elements/1.1/".

If the element specified in getFirstChildElement() or getChild Elements() is not present, those methods return null. You may need to check for this when adapting the code to load other RSS feeds.

If the name Ben Hammersley sounds familiar, he coined the term "podcasting" in his February 2004 article for The Guardian about the new phenomenon of delivering audio files in RSS feeds.

Comments

:
:
:

Popular Pages on This Site