June 9th, 2007 by Chad

So, I have this site I’ve been working on for what seems forever: Sceneseed.com. I wanted to add to the “News” section a list of latest stories from certain RSS feeds of my choosing. So, my challenge was finding a way to parse RSS feeds, and get display them properly on the page.

I found a great little snippet of code called “RssParser” from DZone Snippets

# lib/rss_parser.rb

class RssParser
  require 'rexml/document'
  def self.run(url)
    xml = REXML::Document.new Net::HTTP.get(URI.parse(url))
    data = {
      :title    => xml.root.elements['channel/title'].text,
      :home_url => xml.root.elements['channel/link'].text,
      :rss_url  => url,
      :items    => []
    }
    xml.elements.each '//item' do |item|
      new_items = {} and item.elements.each do |e|
        new_items[e.name.gsub(/^dc:(\w)/,"\1").to_sym] = e.text
      end
      data[:items] << new_items
    end
    data
  end
end

That class will create a hash with a feed’s contents.

To use it, in my controller, I do something like this:

feed = RssParser.run("http://www.superwick.com/feed/")

The next step was to figure out what I needed from the parsed RSS feed for display purposes. Let’s take a look at what the hash “feed” contains from the above operation (using script/console)

CWs-MacBook-Pro:$ script/console
Loading development environment.
>> feed = RssParser.run("http://www.superwick.com/feed/")
=> {:home_url=>"http://www.superwick.com", :title=>"superwick.com", :items=>[{:creator=>"Chad", :pubDate=>"Sat, 02 Dec 2006 22:46:54 +0000", :category=>"Personal", :guid=>"http://www.superwick.com/archives/2006/12/02/broken-leg-better-job-good-times/", :link=>"http://www.superwick.com/archives/2006/12/02/broken-leg-better-job-good-times/", :title=>"Broken Leg + Better Job = Good Times.", :commentRss=>"http://www.superwick.com/archives/2006/12/02/broken-leg-better-job-good-times/feed/", :description=>"Hi.\nSo, most people who read this probably know I broke my leg on September 8th, while visiting some friends in Oklahoma. I&#8217;ve told the story too many times. You&#8217;re gonna have to read it over and see the pics yourselves.\nAnyway, I&#8217;m on the tail-end of my non-weight-bearing crutch-fu. Thank goodness! I can&#8217;t wait for the [...]", :comments=>"http://www.superwick.com/archives/2006/12/02/broken-leg-better-job-good-times/#comments"}, {:creator=>"Chad", :pubDate=>"Fri, 04 Aug 2006 01:18:02 +0000" ..... etc ...... }], :rss_url=>"http://www.superwick.com/feed/"}

There’s a bunch of info there I don’t need. What I want to do in my final feed output is to just show the latest entry. Thankfully, the RSS feed I chose is listed in descending date order, so all I need to do is grab the first entry in :items.

>> feed1 = feed[:items][0]
=> {:creator=>"Chad", :pubDate=>"Sat, 02 Dec 2006 22:46:54 +0000", :category=>"Personal", :guid=>"http://www.superwick.com/archives/2006/12/02/broken-leg-better-job-good-times/", :link=>"http://www.superwick.com/archives/2006/12/02/broken-leg-better-job-good-times/", :title=>"Broken Leg + Better Job = Good Times.", :commentRss=>"http://www.superwick.com/archives/2006/12/02/broken-leg-better-job-good-times/feed/", :description=>"Hi.\nSo, most people who read this probably know I broke my leg on September 8th, while visiting some friends in Oklahoma. I&#8217;ve told the story too many times. You&#8217;re gonna have to read it over and see the pics yourselves.\nAnyway, I&#8217;m on the tail-end of my non-weight-bearing crutch-fu. Thank goodness! I can&#8217;t wait for the [...]", :comments=>"http://www.superwick.com/archives/2006/12/02/broken-leg-better-job-good-times/#comments"}

Great! Now I want to add a couple more RSS feeds. So, in my controller I add:

feed = RssParser.run("http://myoldkyhome.blogspot.com/feeds/posts/default?alt=rss")
feed2 = feed[:items][0]

feed = RssParser.run("http://blog.myspace.com/blog/rss.cfm?friendID=7064743")
feed3 = feed[:items][0]

Now I have 3 hashes with info I want to display. How do I sort these 3 different entries by date descending? It turns out, I need to throw those 3 feeds into an array and then sort on the common key “:pubDate” (which all valid RSS feeds should have in common). But, I need to also convert that :pubDate – which is a string after I parse the RSS feed, into a ruby DateTime object. Here’s the code:

# combine the feeds into an array
@feeds = [feed1, feed2, feed3]

# parse the pubDate strings into a DateTime object
@feeds.each {|x| x[:pubDate] = DateTime.parse(x[:pubDate].to_s)}

# iterate through each feed, sorting by pubDate
@feeds.sort! {|a,b| a[:pubDate] < => b[:pubDate]}

# reverse the array to sort by descending pubDate
@feeds.reverse!

…now in my view, I can just do this:

<% @feeds.each do |feed| %>
<tr valign="top">
  <td class="list-item">
    <p><%= link_to(image_tag("news_thumb.jpg", :class => "floatleft"), feed[:link], :title => "click here to read this story", :target => "_blank") %>
    <%= link_to(feed[:title], feed[:link], :title => "click here to read this story", :target => "_blank") %>
    <br />
    <%= feed[:pubDate].to_s(:not_as_short) %> // by < %= link_to(feed[:creator], feed[:link], :target => "_blank") %>
    </p>
  </td>
</tr>
<% end %>

Here’s what it looks like on my site: http://sceneseed.com/story/list

Another thing I wanted to make sure of is that I was caching this info, so I wasn’t having to go retrieve these feeds and process them each time the page was loaded. I needed a time-based caching solution, so I could expire cached information after a certain time (in order to retrieve any updates that might be in each feed)… I found a nice little Rails plugin from Richard Livsey, which was based on code posted at tourb.us. After I installed said plugin, it was just a matter of a couple lines of code to cache the feeds (as well as only grab and process the feeds if the cache has expired).

So now, finally in my view:


<% cache "news_feeds" do %>
  <table class="list-table">
    <% @feeds.each do |feed| %>
      <tr valign="top">
        <td class="list-item">
          <p><%= link_to(image_tag("news_thumb.jpg", :class => "floatleft"), feed[:link], :title => "click here to read this story", :target => "_blank") %>
	  <%= link_to(feed[:title], feed[:link], :title => "click here to read this story", :target => "_blank") %>
	  <br />
	  <%= feed[:pubDate].to_s(:not_as_short) %> // by < %= link_to(feed[:creator], feed[:link], :target => "_blank") %>
	  </p>
        </td>
      </tr>
    <% end %>

    <tr valign="top" class="<%= cycle('list-line-odd', 'list-line-even') %>">
      <td class="list-item">
        <p><%= link_to(image_tag("news_thumb.jpg", :class => "floatleft"), @mftvb_feed[:link], :title => "click here to read this story", :target => "_blank") %>
	<%= link_to(@mftvb_feed[:title], @mftvb_feed[:link], :title => "click here to read this story", :target => "_blank") %>
	<br />
	from <%= link_to("Musical Family Tree", @mftvb_feed[:link], :target => "_blank") %>
	</p>
      </td>
    </tr>
  </table>
<% end %>

and in my controller:

def get_news_feeds
  when_fragment_expired 'news_feeds', 1.hour.from_now do
    feed = RssParser.run("http://myoldkyhome.blogspot.com/feeds/posts/default?alt=rss")
    feed1 = feed[:items][0]

    feed = RssParser.run("http://blog.myspace.com/blog/rss.cfm?friendID=16510346")
    feed2 = feed[:items][0]

    feed = RssParser.run("http://blog.myspace.com/blog/rss.cfm?friendID=7064743")
    feed3 = feed[:items][0]

    @mftvb_feed = RssParser.run("http://www.musicalfamilytree.com/rss.php")
    @mftvb_feed = @mftvb_feed[:items][0]

    @feeds = [feed1, feed2, feed3]
    @feeds.each {|x| x[:pubDate] = DateTime.parse(x[:pubDate].to_s)}
    @feeds.sort! {|a,b| a[:pubDate] <=> b[:pubDate]}
    @feeds.reverse!
  end
end

This will automagically expire the news feed cache every hour. Huzzah!

Hit me up on Twitter, Facebook, Flickr, or comment below!

17 Responses to “RSS feed parsing in Ruby on Rails (with time-based fragment caching)”

  1. This is exacly what I was looking for my site, thanks for this write-up. I’ll toss this link up on the credit inside my faq page, once i complete roll this code in. Let me know if you want me to point to this page or your homepage.

  2. Thanks Thomas! I’m glad it helped you out. If you want, you can just link to http://superwick.com

    Thanks again!

  3. [...] RSS feed parsing in Ruby on Rails (with time-based fragment caching) [...]

  4. Good stuff feedtools has caching built in but its not well documented, I don’t think its even maintained anymore, thanks for the resolution to a future problem =)

  5. Just a note there’s a syntax error in the rss_parser.rb file. The line:

    data[:items] < < new_items

    Should actually read:

    data[:items] << new_items

    Jason

  6. How can I modify this to pull all the posts from only one feed?

    feed1 = feed[:items] just doesn’t work.

    Thx.

  7. Duh!

    @feeds = feed[:items]

    These aren’t the droids you’re looking for. Move along.

  8. Another question: How would I change the code to have a list of feeds, like you have above, but one of the feeds would show all the posts or a range of posts?

    feed1 = feed[:items][0..4] doesn’t work.

    Thanx.

  9. @Nate:

    You could do something like:

    feed1 = []
    feed[:items].each {|item| feed1 << item}

  10. How can I check to see that the feed is valid before I process it. I am having trouble with a sketchy feed crashing my site.

  11. The link to the plugin is moved to github: http://github.com/rlivsey/timed_fragment_cache/tree/master

  12. This article help me a lot, thanks for the apportation!!

  13. this is exactly what i am looking for. but, it does not work for me. may be because i am working behind proxy. SO, i wanna know whether it is possible to access rss feeds behind a proxy. Please do refer to some docs which help me to understand more.

    thank well in advance

  14. [...] superwick.com » RSS feed parsing in Ruby on Rails (with time-based fragment caching) [...]

  15. Where did the “feed_date” come from? That causes an error when I run it.

  16. Joe, actually, thanks – you caught an error I made thanks to quick copy/paste on my part. In my original app, I actually had the feeds rendered from a partial on each item, which I had mapped some local variables using elements from the feed array:

    < % @feeds.each do |feed| %>
    	< %= render(:partial => 'shared/news_feed_item', :locals => {:feed => feed, :feed_date => feed[:pubDate], :link_name => get_feed_author(feed)}) %>
    < % end %>
    

    I’ll change the code in this posting. We regret the error.

  17. [...] RSS feed parsing in Ruby on Rails (with time-based fragment caching) (tags: ruby rails atom rss programming) [...]

Leave a Reply