PHP Universal Feed Parser – lightweight PHP class for parsing RSS and ATOM feeds.

After the PHP Universal Feed Generator, I’ve written the PHP Universal Feed Parser for Orchid Framework. It’s a RSS and ATOM parser written in PHP5. Though there are many feed parsers over Internet, none of those was serving the basic focuses of Orchid: pure object orientation, being lightweight etc. So, I had to write a new one.

UPDATE(15th May, 2008) : cURL support added. Where url fopen() is disabled, the class will use cURL to load the RSS/ATOM content.

Features:

  • Parses all channels and feed item tags and sub tags.
  • Serve the parsed data as associative array.
  • Enough documented and easy to understand code.
  • Many ways to get parsed information.
  • Parsing includes attributes too.
  • No regular expression used.
  • Parsed by XML Parser extension of PHP.
  • Pure PHP5 objected oriented.
  • Enable to parse all commonly used feed versions.

Supported versions: I tried to include all stable and commonly used feed versions. Currently it’s being used to parse the following versions:

  • RSS 1.0
  • RSS 2.0
  • ATOM 1.0

Download:

  • Click Here to get the class file with example. (downloaded [downloadcounter(feedparser)] times)
  • Download from phpclasses.org.

How to use:

It’s dead simple to use this class. Just follow this 3 steps:

1. Include the file

include(‘FeedParser.php’);

2. Create an object of FeedParser class

$Parser = new FeedParser();

3. Parse the URL you want to featch

$Parser->parse(‘http://www.sitepoint.com/rss.php’);

Done.

Now you can use this functions to get various information of parsed feed:

  • $Parser->getChannels() – To get all channel elements as array
  • $Parser->getItems() – To get all feed elements as array
  • $Parser->getChannel($name) – To get a channel element by name
  • $Parser->getItem($index) – To get a feed element as array by it’s index
  • $Parser->getTotalItems() – To get the number of total feed elements
  • $Parser->getFeedVersion() – To get the detected version of parsed feed
  • $Parser->getParsedUrl() – To get the parsed feed URL

A simple example:

Here is a simple example of using this Feed Parser class. Click here to see is the output of this example.

<?php
include('FeedParser.php');
$Parser     = new FeedParser();
$Parser->parse('http://www.sitepoint.com/rss.php');

$channels   = $Parser->getChannels();
$items      = $Parser->getItems();
?>
<h1 id="title"><a href="<?php echo $channels['LINK']; ?>"><?php echo $channels['TITLE']; ?></a></h1>
<p id="description"><?php echo $channels['DESCRIPTION']; ?> </p>

<?php foreach($items as $item): ?>

    <a class="feed-title" href="<?php echo $item['LINK']; ?>"><?php echo $item['TITLE']; ?></a>
    <p class="feed-description"><?php echo $item['DESCRIPTION']; ?></p>
<?php endforeach;?>

I hope, this class is so easy that, anyone who have general knowledge about PHP5 can use it. Whatever it is, Feel free to ask me anything, anytime.


Note : Hi all, I’ve reported about some situations from users where this class is not working properly. So, I’ve decided to re-write it ASAP. Hope the next one will be more powerful, smaller in size and easier to use.
— Thanks.

53 Comments

  1. Have you applied a license to this class? I need to use it for a commercial website… is that OK? 😛

  2. Hi Anis, i use you code but i have a problem, in the description fo the feeds exist HTML code and the tag for p>

    Sorry for my english is so bad. Thanks for help me

  3. Hello
    2 issues:
    1. Doesn’t recognise all feed versions e.g ‘ATOM 1’,
    ‘0.90’ => ‘RSS 1.0’,
    ‘0.91’ => ‘RSS 2.0’,
    ‘0.92’ => ‘RSS 2.0’,
    ‘0.93’ => ‘RSS 2.0’,
    ‘0.94’ => ‘RSS 2.0’,
    ‘1.0’ => ‘RSS 1.0’,
    ‘2.0’ => ‘RSS 2.0’,

    2: There’s an issue in parsing htmlentities. Any feed with e.g <table will have the < character stripped out producing invalid html.

    Any solutions?

    Thanks

  4. Hi, I’m keen to know how to parse more than one feed on a site, too…

    Wanna show my delicious links by tag… so my idea was to fetch the tags and then fetch the items for each tag….

    Can anyone tell me how I do that?

    Thanks a lot in advance 🙂

  5. @StephenB, @Omar

    Hope to solve the html entity problem whenever I get some free time.
    Thanks a lot for informing about this problem.
    If someone already solved it, Plz let me know. We all will be greatfull to u.

  6. Hi Livia, Deamon, Sprdnja and others

    Thanks a lot for asking.
    Sorry for being late to reply. Actually I m very busy 🙁

    It’s very easy to parse multiple rss. Just create new instances of FeedParser class for each URLs.

    You can do it this way:

    $Parser1 = new FeedParser();
    $Parser1->parse(‘http://my-first-link/rss’);

    $Parser2 = new FeedParser();
    $Parser2->parse(‘http://my-second-link/rss’);
    …………

    If you want to keep all parsed rss together in an array:

    $myFeeds = array();

    $myFeeds[‘tag1’] = new FeedParser();
    $myFeeds[‘tag2’] = new FeedParser();
    …..
    $myFeeds[‘tag1’]->parse(‘http://tag1-link/rss’);
    $myFeeds[‘tag2’]->parse(‘http://tag2-link/rss’);
    …..

    Thanks again

  7. OK, Lets try the HTML Code tag!

    Great Parser..
    How can I use it to parse embedded content (from WordPress RSS2 feed) like;


    <![CDATA[ .... ]]>

    Thanks

  8. OK, The PRE tag!

    Great Parser..
    How can I use it to parse embedded content (from WordPress RSS2 feed) like;

    <![CDATA[ …. ]]>

    Thanks

  9. Rad tool, thanks. One thing though: your script doesn’t handle atom formats that don’t have closing elements. For example, if there is a in the feed it doesn’t find it’s way into the array. Friendfeed uses this extensively. See, http://friendfeed.com/calmebob?format=atom . Any ideas how I can quickly hack this in? I’m not a php expert but willing to try. Thanks again!

  10. hi, i tried your class for parsing rss. It can handle some rss. But i found that it can’t retrieve rss from many sites like punbb, twitter (that i tried). It show Sorry! cannot detect the feed version.

  11. @Tareq, @dave and more….

    Hi all, I’ve reported about some situations from users where this class is not working properly. So, I’ve decided to re-write it ASAP. Hope the next one will be more powerful, smaller in size and easier to use.

    Thanks for informing me about your problems.

  12. Assalam Alaykom Anis. I want to thank you for this class. . its the only class through the whole web I could use..
    but the only problem with it. . it only shows the 1st item in the rss & the loop doesn’t work to show the rest of the rss items.. I wonder if any1 have this problem or I’m doing something wrong. .

    otherwise; waiting for your modified version..

    thank you so much
    ____________
    asalam 3laykom

  13. This is really awesome. people search for this code everywhere and find only talks but no working code. Thanks for putting up comprehensive code.

  14. Thanks for the awesome parser.
    Is there a way to set a filter in the parser?
    (like parsing links with match certain key words)

  15. hi..i think that have a bug…

    if the rss has more than 1 element the script put all the values at the same line

    like this: [CATEGORY] => delicinhasimagens

    and the correct is:
    delcinhas
    imagens

  16. Ok I love this script for it’s simplicity I had only one problem with it, and that is that attributes of tags without a value will not work.

    This is caused by the use of the characterData function for the attributes which is only called when a tag has a value.

    I quickly hacked up a version where the attribute functionality has been moved to a separate function which will always be called on every tag.

    Enjoy: http://pastebin.com/f79ac9041

  17. @StephenB, @Omar

    I solved this for myself by changing all lines that had
    = strip_tags($this->unhtmlentities((trim($data))));

    to
    = $data; //strip_tags($this->unhtmlentities((trim($data))));

    @admin sorry, i’m not a php pro, but what are you trying to accomplish with unhtmlentities?

    Im likely missing something. Maybe the php strip_tags() with allowable_tags is applicable?

    🙂

    Thanks,
    Awesome time saver!

  18. Hello, I am having trouble displaying other feeds besides ‘http://www.sitepoint.com/rss.php'(the one in your example) when I try to put another rss.php or xml or whaterver rss feed link there i either get a blank page, an error, or nothing happens. Also how do I add multiple links to feeds as well as limit the number of results?

    thans you kindly

  19. Thanks for good software!

    I just added FeedReader to my homepage.

    I hope UFP is still supported. Is it newer version available?

  20. Hi,
    Iam getting this error other than parsing feeds some new sites.
    Error:Sorry! cannot detect the feed version.

    How to handle this error.
    suggest me with a solution

    Thanks in advance

    regards,
    Ars

  21. Just wanted to say thanks for the great php class for rss parsing.

    Made mince meat of my problems and even helped me reduce my coding by about 15/20 lines 🙂

    Cheers!

  22. hi anis,
    i modified your class to handle much more feed versions…
    i just see that other people had the same problem…

    i think it is elegant too…

    send me a PM if you want it.

    gilles

  23. @Shannon Thrasher, @StephenB, @Omar

    The problem lies with the usage of the following line:

    strip_tags($this->unhtmlentities((trim($data))));

    you can replace it with

    $this->unhtmlentities($data);

    The unhtmlentities class can be optimized when using php5 to:

    private function unhtmlentities($string)
    {
    return html_entity_decode( $string );
    }

  24. Hi,

    Feed Parser works fine so far – thanx a lot!

    There’s one exception i found:
    If i try to parse a friend feed’s (Atom) CONTENT-Tag (has attributes and contains HTML), it only puts out “Array”.

    What can i do?

    Cheers
    Ralf

  25. Hi,

    I’ve been hacking for a long time with your code until I could have it to work properly.

    What troubled me:
    1 – You assume that if a tag has one character it is empty, though empty would be 0 characters. Fixing this will fix the bug that makes ‘getParentTag() == ‘ENTRY’ && $tagName == ‘LINK’ && $attrs[‘REL’]==’alternate’) {
    $this->items[$this->itemIndex][$tagName] = $attrs[‘HREF’];
    }

    Since you say you want to rewrite it from scratch, my recommendation is that you make your class fully abstract the source xml. If I decide to use a helper class is because I don’t wan’t to get into the details of how atom feeds and rss feeds are constructed, but with your code I had to analize the xml to find out that in an atom feed the ‘DESCRIPTION’ is called ‘CONTENT’. I would also use lowercase attributes in your arrays as a subtle way to show to the people that you are abstracting the data, instead of just replicating some parts.

  26. Hi,

    Notice: Undefined offset: -1 in /home/ali/public_html/php-test/FeedParser.php on line 345 Sorry! cannot detect the feed version.

    Solve;

    $this->currentTag = $this->insideItem[count($this->insideItem)-1];

    replace it

    $this->currentTag = end($this->insideItem);

  27. As it stands, the code does not allow for multiple instances of the same tag. Some tags (like CATEGORY) often have multiple values. The way the code is now they just get squashed together into one line.

    I modified the characterData() function to convert the $items entry into an array if there are multiple values for the tag:

    ===
    Original (line 425):

    $this->items[$this->itemIndex][$this->currentTag] .= strip_tags($this->unhtmlentities((trim($data))));

    ===
    Replace With:

    if(is_array($this->items[$this->itemIndex][$this->currentTag])){
    $this->items[$this->itemIndex][$this->currentTag][] = strip_tags($this->unhtmlentities((trim($data))));
    }
    elseif(isset($this->items[$this->itemIndex][$this->currentTag])){
    $this->items[$this->itemIndex][$this->currentTag] = array($this->items[$this->itemIndex][$this->currentTag], strip_tags($this->unhtmlentities((trim($data)))));
    }
    else{
    $this->items[$this->itemIndex][$this->currentTag] .= strip_tags($this->unhtmlentities((trim($data))));
    }
    ===

    There are neater ways to do this. I just threw this together real quick because I needed it to work this way.

    Hope that helps somebody.

    Jon

  28. Great plugin – only one problem.

    When I try to parse ‘http://feeds.bbci.co.uk/news/technology/rss.xml’ it doesn’t return anything, but doesn’t show any errors!?

    Any ideas?

Leave a Comment

Your email address will not be published. Required fields are marked *