After the PHP Universal Feed Generator, I’ve written the PHP Universal Feed Parser for Orchid Framework. It’s a RSS and ATOM parser written in PHP5. Though there are many feed parsers over Internet, none of those was serving the basic focuses of Orchid: pure object orientation, being lightweight etc. So, I had to write a new one.
UPDATE(15th May, 2008) : cURL support added. Where url fopen() is disabled, the class will use cURL to load the RSS/ATOM content.
Features:
- Parses all channels and feed item tags and sub tags.
- Serve the parsed data as associative array.
- Enough documented and easy to understand code.
- Many ways to get parsed information.
- Parsing includes attributes too.
- No regular expression used.
- Parsed by XML Parser extension of PHP.
- Pure PHP5 objected oriented.
- Enable to parse all commonly used feed versions.
Supported versions: I tried to include all stable and commonly used feed versions. Currently it’s being used to parse the following versions:
- RSS 1.0
- RSS 2.0
- ATOM 1.0
Download:
- Click Here to get the class file with example. (downloaded [downloadcounter(feedparser)] times)
- Download from phpclasses.org.
How to use:
It’s dead simple to use this class. Just follow this 3 steps:
1. Include the file
include(‘FeedParser.php’);
2. Create an object of FeedParser class
$Parser = new FeedParser();
3. Parse the URL you want to featch
$Parser->parse(‘http://www.sitepoint.com/rss.php’);
Done.
Now you can use this functions to get various information of parsed feed:
- $Parser->getChannels() – To get all channel elements as array
- $Parser->getItems() – To get all feed elements as array
- $Parser->getChannel($name) – To get a channel element by name
- $Parser->getItem($index) – To get a feed element as array by it’s index
- $Parser->getTotalItems() – To get the number of total feed elements
- $Parser->getFeedVersion() – To get the detected version of parsed feed
- $Parser->getParsedUrl() – To get the parsed feed URL
A simple example:
Here is a simple example of using this Feed Parser class. Click here to see is the output of this example.
<?php
include('FeedParser.php');
$Parser = new FeedParser();
$Parser->parse('http://www.sitepoint.com/rss.php');
$channels = $Parser->getChannels();
$items = $Parser->getItems();
?>
<h1 id="title"><a href="<?php echo $channels['LINK']; ?>"><?php echo $channels['TITLE']; ?></a></h1>
<p id="description"><?php echo $channels['DESCRIPTION']; ?> </p>
<?php foreach($items as $item): ?>
<a class="feed-title" href="<?php echo $item['LINK']; ?>"><?php echo $item['TITLE']; ?></a>
<p class="feed-description"><?php echo $item['DESCRIPTION']; ?></p>
<?php endforeach;?>
I hope, this class is so easy that, anyone who have general knowledge about PHP5 can use it. Whatever it is, Feel free to ask me anything, anytime.
Note : Hi all, I’ve reported about some situations from users where this class is not working properly. So, I’ve decided to re-write it ASAP. Hope the next one will be more powerful, smaller in size and easier to use.
— Thanks.
Have you applied a license to this class? I need to use it for a commercial website… is that OK? 😛
@David
It’s distributed under GPL. So, No problem… u can use it for your commercial website 🙂
Hi Anis, i use you code but i have a problem, in the description fo the feeds exist HTML code and the tag for p>
Sorry for my english is so bad. Thanks for help me
Hello
2 issues:
1. Doesn’t recognise all feed versions e.g ‘ATOM 1’,
‘0.90’ => ‘RSS 1.0’,
‘0.91’ => ‘RSS 2.0’,
‘0.92’ => ‘RSS 2.0’,
‘0.93’ => ‘RSS 2.0’,
‘0.94’ => ‘RSS 2.0’,
‘1.0’ => ‘RSS 1.0’,
‘2.0’ => ‘RSS 2.0’,
2: There’s an issue in parsing htmlentities. Any feed with e.g <table will have the < character stripped out producing invalid html.
Any solutions?
Thanks
that should be < (less than & greater than)
and doesn’t recognise http://purl.org/atom/ns# for ATOM
Has anyone tried parsing a website that requires login?
Hi,
This script is great.
How can I use it to parse multiple rss feeds on one page?
Tell us how to parse more than one feed ?!!??
Hi, I’m keen to know how to parse more than one feed on a site, too…
Wanna show my delicious links by tag… so my idea was to fetch the tags and then fetch the items for each tag….
Can anyone tell me how I do that?
Thanks a lot in advance 🙂
@StephenB, @Omar
Hope to solve the html entity problem whenever I get some free time.
Thanks a lot for informing about this problem.
If someone already solved it, Plz let me know. We all will be greatfull to u.
Hi Livia, Deamon, Sprdnja and others
Thanks a lot for asking.
Sorry for being late to reply. Actually I m very busy 🙁
It’s very easy to parse multiple rss. Just create new instances of FeedParser class for each URLs.
You can do it this way:
$Parser1 = new FeedParser();
$Parser1->parse(‘http://my-first-link/rss’);
$Parser2 = new FeedParser();
$Parser2->parse(‘http://my-second-link/rss’);
…………
If you want to keep all parsed rss together in an array:
$myFeeds = array();
$myFeeds[‘tag1’] = new FeedParser();
$myFeeds[‘tag2’] = new FeedParser();
…..
$myFeeds[‘tag1’]->parse(‘http://tag1-link/rss’);
$myFeeds[‘tag2’]->parse(‘http://tag2-link/rss’);
…..
Thanks again
Thank you so very much for your help 🙂
OK, Lets try the HTML Code tag!
Great Parser..
How can I use it to parse embedded content (from WordPress RSS2 feed) like;
<![CDATA[ .... ]]>
Thanks
OK, The PRE tag!
Great Parser..
How can I use it to parse embedded content (from WordPress RSS2 feed) like;
<![CDATA[ …. ]]>
Thanks
Thank, it’s a great parser, but i have one problem.
The parser striped the < from feed description. Have any solution for this problem?
I would like to parse yahoo’s media tags.. ex. or
Is it possible?
Rad tool, thanks. One thing though: your script doesn’t handle atom formats that don’t have closing elements. For example, if there is a in the feed it doesn’t find it’s way into the array. Friendfeed uses this extensively. See, http://friendfeed.com/calmebob?format=atom . Any ideas how I can quickly hack this in? I’m not a php expert but willing to try. Thanks again!
cant fgure out why i get undefined index errors. can anyone help?
DM
hi, i tried your class for parsing rss. It can handle some rss. But i found that it can’t retrieve rss from many sites like punbb, twitter (that i tried). It show Sorry! cannot detect the feed version.
@Tareq, @dave and more….
Hi all, I’ve reported about some situations from users where this class is not working properly. So, I’ve decided to re-write it ASAP. Hope the next one will be more powerful, smaller in size and easier to use.
Thanks for informing me about your problems.
Assalam Alaykom Anis. I want to thank you for this class. . its the only class through the whole web I could use..
but the only problem with it. . it only shows the 1st item in the rss & the loop doesn’t work to show the rest of the rss items.. I wonder if any1 have this problem or I’m doing something wrong. .
otherwise; waiting for your modified version..
thank you so much
____________
asalam 3laykom
This is really awesome. people search for this code everywhere and find only talks but no working code. Thanks for putting up comprehensive code.
Thanks for the awesome parser.
Is there a way to set a filter in the parser?
(like parsing links with match certain key words)
How would I use this to display maybe the most recent three entries in the feed?
Cool parser… Thank You…!
hi..i think that have a bug…
if the rss has more than 1 element the script put all the values at the same line
like this: [CATEGORY] => delicinhasimagens
and the correct is:
delcinhas
imagens
and have the problem with cdata
Ok I love this script for it’s simplicity I had only one problem with it, and that is that attributes of tags without a value will not work.
This is caused by the use of the characterData function for the attributes which is only called when a tag has a value.
I quickly hacked up a version where the attribute functionality has been moved to a separate function which will always be called on every tag.
Enjoy: http://pastebin.com/f79ac9041
@StephenB, @Omar
I solved this for myself by changing all lines that had
= strip_tags($this->unhtmlentities((trim($data))));
to
= $data; //strip_tags($this->unhtmlentities((trim($data))));
@admin sorry, i’m not a php pro, but what are you trying to accomplish with unhtmlentities?
Im likely missing something. Maybe the php strip_tags() with allowable_tags is applicable?
🙂
Thanks,
Awesome time saver!
Hi!
Great class. I have successfully used to parse RSS feeds, but I cannot parse ATOM feeds from hi5
Any idea?
Hi!
i have this error:
Notice: Undefined offset: -1 in D:SERVEUREasyPHP5.3.0wwwrssFeedParser.php on line 345
Sorry! cannot detect the feed version.
for this flux :
http://compiegne.cyber-base.org/cyberbase/rss/minisite/ateliersRSSAction.do?id=682
Any idea?
thank you.
how do i display the image of a item(feed item)?
Hello, I am having trouble displaying other feeds besides ‘http://www.sitepoint.com/rss.php'(the one in your example) when I try to put another rss.php or xml or whaterver rss feed link there i either get a blank page, an error, or nothing happens. Also how do I add multiple links to feeds as well as limit the number of results?
thans you kindly
Thanks for good software!
I just added FeedReader to my homepage.
I hope UFP is still supported. Is it newer version available?
Hi,
Iam getting this error other than parsing feeds some new sites.
Error:Sorry! cannot detect the feed version.
How to handle this error.
suggest me with a solution
Thanks in advance
regards,
Ars
Just wanted to say thanks for the great php class for rss parsing.
Made mince meat of my problems and even helped me reduce my coding by about 15/20 lines 🙂
Cheers!
Hi I just found a great free parser http://bncscripts.com/free-php-rss-parser/
Wery useful and easy to use
hi anis,
i modified your class to handle much more feed versions…
i just see that other people had the same problem…
i think it is elegant too…
send me a PM if you want it.
gilles
Incredibly good appreciate it, I reckon your current visitors would most likely want way more well written articles of this nature keep up the great content.
@Shannon Thrasher, @StephenB, @Omar
The problem lies with the usage of the following line:
strip_tags($this->unhtmlentities((trim($data))));
you can replace it with
$this->unhtmlentities($data);
The unhtmlentities class can be optimized when using php5 to:
private function unhtmlentities($string)
{
return html_entity_decode( $string );
}
Hi,
Feed Parser works fine so far – thanx a lot!
There’s one exception i found:
If i try to parse a friend feed’s (Atom) CONTENT-Tag (has attributes and contains HTML), it only puts out “Array”.
What can i do?
Cheers
Ralf
Thank you and Please retain updating your Web site. I are going to be stopping by every time you do .
Hi,
I’ve been hacking for a long time with your code until I could have it to work properly.
What troubled me:
1 – You assume that if a tag has one character it is empty, though empty would be 0 characters. Fixing this will fix the bug that makes ‘getParentTag() == ‘ENTRY’ && $tagName == ‘LINK’ && $attrs[‘REL’]==’alternate’) {
$this->items[$this->itemIndex][$tagName] = $attrs[‘HREF’];
}
Since you say you want to rewrite it from scratch, my recommendation is that you make your class fully abstract the source xml. If I decide to use a helper class is because I don’t wan’t to get into the details of how atom feeds and rss feeds are constructed, but with your code I had to analize the xml to find out that in an atom feed the ‘DESCRIPTION’ is called ‘CONTENT’. I would also use lowercase attributes in your arrays as a subtle way to show to the people that you are abstracting the data, instead of just replicating some parts.
Spectacular blog, near bloke that wrote so often. The biggest possibility a affairs that writes to-date knowledge 🙂
Spectacular blog, by lover that wrote so often. The power factor that writes to-date information 🙂
this very awesome…
I was greatly helped by your article
keep work this..
Hi,
Notice: Undefined offset: -1 in /home/ali/public_html/php-test/FeedParser.php on line 345 Sorry! cannot detect the feed version.
Solve;
$this->currentTag = $this->insideItem[count($this->insideItem)-1];
replace it
$this->currentTag = end($this->insideItem);
As it stands, the code does not allow for multiple instances of the same tag. Some tags (like CATEGORY) often have multiple values. The way the code is now they just get squashed together into one line.
I modified the characterData() function to convert the $items entry into an array if there are multiple values for the tag:
===
Original (line 425):
$this->items[$this->itemIndex][$this->currentTag] .= strip_tags($this->unhtmlentities((trim($data))));
===
Replace With:
if(is_array($this->items[$this->itemIndex][$this->currentTag])){
$this->items[$this->itemIndex][$this->currentTag][] = strip_tags($this->unhtmlentities((trim($data))));
}
elseif(isset($this->items[$this->itemIndex][$this->currentTag])){
$this->items[$this->itemIndex][$this->currentTag] = array($this->items[$this->itemIndex][$this->currentTag], strip_tags($this->unhtmlentities((trim($data)))));
}
else{
$this->items[$this->itemIndex][$this->currentTag] .= strip_tags($this->unhtmlentities((trim($data))));
}
===
There are neater ways to do this. I just threw this together real quick because I needed it to work this way.
Hope that helps somebody.
Jon
Thanks a lot for using and improving it!
Great plugin – only one problem.
When I try to parse ‘http://feeds.bbci.co.uk/news/technology/rss.xml’ it doesn’t return anything, but doesn’t show any errors!?
Any ideas?