Using RSS News Feeds | 5 | WebReference

Using RSS News Feeds | 5

Using RSS News Feeds

The XML::RSS Module

Now that you've had a change to glance at two RSS examples, it's time to introduct the XML::RSS module. XML::RSS is a subclass of XML::Parser, a Perl module maintained by Clark Cooper that utilizes James Clark's Expat C library. XML::RSS was developed to simplify the task of manipulating and parsing RSS files. A deep understanding of XML is not a prerequisite for using XML::RSS since the XML details are hidden inside the class interface.

While XML::RSS is capable of creating RSS files, we will be focusing on parsing existing RSS files in this column. You can read more about the capabilities of XML::Parser in the module's documentation or by typing:
perldoc XML::RSS

The Code

Well, let's look at the code shall we? Lines 16-17 load the XML::RSS and LWP::Simple modules. We've already talked about XML::RSS in brief, but what does LWP::Simple do? Good question! The answer is simple (puns intended). It's a procedural interface for interacting with a Web server. It's also the little cousin of LWP::UserAgent, a fuller object oriented interface. We'll be using one of the library's subroutines later in the code to fetch an RSS file from the Web.

In lines 20-21 we initialize two variables that we're going to use later.

Line 25 starts the main code body. The first thing we do is verify that the user typed exactly one command-line parameter. This parameter is then assigned to the $arg variable in line 28.

Next we create a new instance of the XML::RSS class and assign the reference to the $rss variable on line 31.

Now we must determine whether the command-line parameter the user entered is an HTTP URL or a file on the local file system (lines 34-46). On line 34, we us a regular expression to look for the characters http:.

If the command-line argument starts with these characters, we can safely assume that the user intends to retrieve an RSS file from a Web server. On line 35 we pass the argument to the get() function, which is a part of LWP::Simple, and assign the results to the $content variable. On line 36 we call die() if $content is empty. If this happens, it means there was an error retrieving the RSS file. If the RSS file was downloaded successfully, $rss->parse($content) is called which parses the RSS file and stores the results in the object's internal structure (line 38).

If the command-line argument does not contain the http: characters, we assume the argument is a file instead of a URL on lines 41-46. The first thing we do is assign the value of $arg to the $file variable and test for the existence of the file (lines 42-43).

Then we call $rss->parsefile($file) (line 45), which parses the RSS file and stores the results in the object's internal structure. The parsefile() method parses a file, whereas the parse() method parses the string that's passed to it.

Lastly, we call the print_html subroutine on line 49, which converts the RSS object in nicely formatted HTML.


As you examine this subroutine, you will begin to understand the internal structure of the XML::RSS object. The critical portion of the subroutine is contained on lines 76-79. In this foreach loop, we iterate over each of the RSS items.

Next, let's take a look at in action.

Produced by Jonathan Eisenzopf and
Created: September 1, 1999
Revised: Septemver 1, 1999