Home > PHP >
Add XML News Feeds to Your Site | Sitemap Search |
|
Sections Membership Features
Recent comments
very difficult by alfin Taking the credit for another persons work ? by curious dude. |
Add XML News Feeds to Your SitePosted by martin on 4 Sep 2002, last updated on 6 Sep 2002. A lot of sites now offer news feeds in XML format that you can use to fetch their news - the most common format is RSS (Rich Site Summary). With the help of PHP you can parse such a feed, even without XSLT support, and display it on your site. How to get the feedTo get the feed, you can use PHP's PHP will internally do a GET request using HTTP 1.0, it will also send a Host header needed by name virtual hosts.
<?php
$feed = 'http://slashdot.org/slashdot.rdf';
ini_set('allow_url_fopen', true);
$fp = fopen($feed, 'r');
$xml = '';
while (!feof($fp)) {
$xml .= fread($fp, 128);
}
fclose($fp);
This example shows how to get the Slashdot news feed. We have explicitly set A much better idea, in terms of improved error handling, would be to use
<?php
$host = 'slashdot.org';
$uri = 'slashdot.rdf';
$fp = fsockopen($host, 80, $errno, $errstr, 20);
if (!$fp) {
die("Network error: $errstr ($errno)");
} else {
$xml = '';
fputs($fp, "GET /$uri HTTP/1.0\r\nHost: $host\r\n\r\n");
while (!feof($fp)) {
$xml .= fgets($fp, 128);
}
fclose ($fp);
}
We're actually doing exactly the same thing, although we can choose the timeout this time with the last parameter to Note: Actually we get the HTTP headers also, but that's not an issue. The other advantage is we can print nicer error messages than the ugly ones produces by PHP. Parsing the XML feedNow that we have the data available as a PHP string, the next step is to parse it. As I mentioned in the introduction this method allows us to handle the news feed even without XSL support built-in PHP. I have used a slightly modified version of the Caveats for the
I still haven't seen a feed that uses something like that, so I hope it is OK. Of course if you know about such a feed it would help if you share it with me.
function untag($string, $tag) {
$tmpval = array();
$preg = "|<$tag>(.*?)</$tag>|s";
preg_match_all($preg, $string, $tags);
foreach ($tags[1] as $tmpcont){
$tmpval[] = $tmpcont;
}
return $tmpval;
}
The function simply uses Perl Compatible Regular Expressions to extract the contents of all XML elements with the given name and return them as an array. Sample from Slashdot's feed<item> <title>Do Cell Phones Make Us Stupid?</title> <link>http://slashdot.org/article.pl?sid=02/09/03/1429222</link> </item> <item> <title>Slashback: Google, Prince, Bayesian</title> <link>http://slashdot.org/article.pl?sid=02/09/03/0138216</link> </item> As you can see the XML code is very simple, of course there is some additional data, but it is meta information about the feed, like title, description, logo, etc. Transform it into HTMLNow we get to the point where we need to extract that information from the feed and transform it into HTML, which browsers can handle.
$items = untag($xml, 'item');
$html = '<p>';
foreach ($items as $item) {
$title = untag($item, 'title');
$link = untag($item, 'link');
$html .= '<a href="' . $link[0] . '">' . $title[0] . "</a><br />\n";
}
$html .= '</p>';
echo $html;
We are using After we have the title and link we simply create an HTML link to the article using the title text we have. Of course, if you don't like the generated HTML you can use your own. Some feeds also have a You can also display, for example only 3 news items, if you change the CommentsNewbie help by MattD (matt@tunnelamerica.com) on 1 Aug 2003 10:45pm GMT >>You can also display, for example only 3 news items, if you change the foreach with a different loop construct. How exactly would you write that? Please help. - Matt RE: Newbie help by snowy (snowy@nospam.net) on 11 Sep 2003 7:36pm GMT there are loadsa ways of only showing a limited number of items..: i.e. $id=0; foreach ($items as $item){ if($id=="5") { break; } $id++; $title = untag($item, 'title'); $link = untag($item, 'link'); rest of code.... link by Matthew (matthew@zontheweb.com) on 16 Sep 2003 5:19am GMT Hi there, Thanks for the use of this script. The problem I have is that the xml feed I've got has links that look like this: http%3A%2F%2Fwww%2Enzherald%2Eco%2Enz%2Fstorydisplay%2Ecfm%3FstoryID%3D3523776%26thesection %3Dtechnology%26thesubsection%3Dgeneral Can the parser be modified to deal with this? xml news feedd by sam (samplatt@commspeed.net) on 23 Oct 2003 5:17am GMT works great exept it whipes out the code that comes after it iused an inclde function to iclude thx php script into my html when i veiw the page everything below the php is not seen by th browser huh if you can help thanks news feed by sam (samplatt@commspeed.net) on 23 Oct 2003 5:31am GMT another quick question how can i make the head lines open a new browser for the hole story when clicked unwanted tags by John (jsasser@sasserized.com) on 3 Nov 2003 8:25pm GMT I am wanting to do more than just show the links and title.... I want to display the title as a link and then the description underneath it. This is completely allowed by the site feeding the rss. My problem lies in the fact that some descriptions will have <p> tag in them....this causes that description to not be displayed....how do i correct? re: news feed by steve (none@none.com) on 6 Nov 2003 2:01am GMT sam: set up your link tags as: a target=_blank href=[html link here] the target=_blank causes the link to open in a new window when clicked. john: unwanted tags. just process the descriptions prior to displaying them. copy all the xml into a char string until you encounter the tag, skip past it, continue copying. XML news feed by Andy (andy@peakfinder.co.uk) on 19 Nov 2003 4:28pm GMT It don't work :-) I get the error ... Warning: fopen(http://slashdot.org/slashdot.rdf): failed to open stream: Bad file descriptor in C:\htdocs\test.php on line 6 What about removing duplicates? by Adrian () on 3 Jan 2004 4:03pm GMT Moreover has a big list of news headline feeds: http://w.moreover.com/categories/category_list_xml.html Unfortunately Moreover is atrocious at weeding out identical stories. Using the feeds they provide, how might I go about discarding any <article> where the <headline_text> is not unique? Here's an example of one article: - <article id="_113573634"> <url>http://c.moreover.com/click/here.pl?x113573634</url> <headline_text>British Airways to Resume Flight to D.C</headline_text> <source>AP via New York Post</source> <media_type>text</media_type> <cluster>UK business news</cluster> <tagline /> <document_url>http://breakingnews.nypost.com</document_url> <harvest_time>Jan 3 2004 2:34PM</harvest_time> <access_registration /> <access_status /> </article> Thanks by Daniel Coe () on 21 Jan 2004 9:12am GMT What a great article, it really helped me get an XML document into a CSV file. thanks by Billy () on 1 Mar 2004 12:36pm GMT Hey thanks for the awesome code if anyone is interested, here's mine with some small changes (messy but works) grabs 3 random headlines, puts the full headline in the title attribute of the a tag and shortens the headline if necessary (ie length is > 20) with '...' on the end $feed = 'http://www.some.com.au/FEED/xml.html'; ini_set('allow_url_fopen', true); $fp = fopen($feed, 'r'); $xml = ''; while (!feof($fp)) { $xml .= fread($fp, 128); } fclose($fp); function untag($string, $tag) { $tmpval = array(); $preg = "|<$tag>(.*?)</$tag>|s"; preg_match_all($preg, $string, $tags); foreach ($tags[1] as $tmpcont){ $tmpval[] = $tmpcont; } return $tmpval; } $items = untag($xml, 'resource'); $max=count($items)-3; $start=rand(0,$max); $html = ''; for($i=$start;$i<$start+3;$i++){ $title = untag($items[$i], 'headline'); $link = untag($items[$i], 'linkto'); $html .= '<a href="' . $link[0] . '" title="' . $title[0] . '">'; $len=strlen($title[0]); if($len>20){ $words=explode(' ',$title[0]); $title[0]=''; for($w=0;$w<count($words);$w++){ $title[0] .= $words[$w] . ' '; if(strlen($title[0])>20){ $title[0] .= '...'; break; } } } $html .= $title[0] . "</a><br />\n"; } echo $html; cheers A particular subject by Alf (adpaster@hotmail.com) on 9 Mar 2004 2:28pm GMT Is there any way to get a particular area like all the articles on the job market or the housing market? adding a date by v () on 30 Apr 2004 8:34am GMT i'm fiddling with it, but i'm having a hard time trying to get the date inserted in the html string as well...any ideas? limiting results? by () on 3 May 2004 5:01pm GMT is there a way to limit the amount of feeds that come through, e.g I only want the first 5 news headlines limited number of items by Elcio Figueiredo (elcio.figueiredo@atento.com.br) on 21 May 2004 1:35pm GMT foreach ($items as $item) {
$description = untag ($item, 'description'); $title = untag($item, 'title'); $link = untag($item, 'link'); $html .= '<a href="' . $link[0] . '">' . $title[0] . "</a><br />\n"; $html2 .= strip_tags($description[0]);
$descricao=explode('(<i>',$description[0]); //limited number of items $a++; if($a==4) { exit; } //end limited number of items print $descricao[0]; print '<br><a href="' . $link[0] . '">' . $title[0] . "</a><br />\n"; } Making the HTML part from the XML Code by Everett (eelintz2@yahoo.com) on 21 Jun 2004 12:05am GMT I want to know what the HTML code would be for the XML code at http://rss.groups.yahoo.com/group/nascar_pitstop/rss. How would you make the code? W00t by Jon () on 22 Jun 2004 7:06pm GMT Very nice work. I've been toying with this for hours here at work in textpad (it's so difficult to pull these things together when you have no place to debug it either!) This article has really helped me out. Thanks:) Limited items and description by SteveT (admin@hostform.com) on 30 Jun 2004 4:45am GMT I was grabbing the top five stories from Yahoo and wanted to show the first 100 characters of the description. I wrote a quick function that starts at the 100th character and loops until it finds a space and a suitable break point. function strclnup($description) { for($i = 100; $i <= 150; $i++): if (substr($description,$i,1) == " ") { return substr($description,0,$i); break; } endfor; } $html = '<p>'; $id=0; foreach ($items as $item){ if($id=="5") { break; } $id++; $title = untag($item, 'title'); $link = untag($item, 'link'); $descr = untag($item, 'description'); $html .= "<a href='" . $link[0] . "' target='_new'>" . $title[0] . "</a><br>" . strclnup($descr[0]) . "...<br>\n"; } $html .= "</p>"; Yahoo News Search by Daniele Leone (info@danieleleone.com) on 13 Jul 2004 9:51am GMT Hi all, my name is Daniele Leone, i read someone ask for reading news from Yahoo!. I just built a simple but powerfull script that make you able to search in yahoo news feeds. Try the demo here http://www.danieleleone.com and download it for free if you like ! ;-) Bye, Daniele Another way Shorten Description by David R (david@dradept.com) on 27 Jul 2004 10:06am GMT What a great XML feed routine, thanks! I use it succesfully taking feeds from Silicon.com for my site www.FirstAidforComputers.com By the way SteveT I use a routine to shorten descriptions as follows: $sh = array_slice(explode(" ",$description),0,100); $shortened = implode(" ",$sh); Hope it comes in useful. An alternative shorten function by mattmook () on 21 Aug 2004 12:47am GMT Instead of exploding the string in to an array just to convert it back, a similar thing can be done using a regular expression all in one line: $title = preg_replace("/(.{20}.)\s.$/","\${1}...",$title); Here the code trims the title to 20 characters but keeps the last word as a complete word and places an elipse ('...') at the end if the length is over this. An alternative shorten function... again by mattmook () on 21 Aug 2004 12:51am GMT Um... seems as though the server decided to chew my code up! Let's try that again: $title = preg_replace("/(.{20}._ASTERIX_)\s._ASTERIX_$/","\${1}...",$title); where _ASTERIX_ is an asterix symbol (as in one of those star characters - shift-8 on a Mac). Hope that helps! password protected feeds by mark (mark@stocktrak.com) on 26 Aug 2004 4:25pm GMT how do I grab a password protected feed if I know the password? Another cock teaser by Hennie () on 7 Oct 2004 12:52pm GMT If it seems to good to be true, it probably isn't. If this script worked, it would have been great... I just get a blank page. I'm using PHP5. Does anybody knows why it's not working? Thanks Martin by Hennie () on 7 Oct 2004 6:46pm GMT I used your second method and it gives the links to the newsfeeds now. Thanks a million. I stand corrected. This is the real thing. pubDate better way? by Dave A () on 20 Nov 2004 12:07am GMT I used thiscode to display the pubDate localtime, but there must be a better way $pubDate = untag($item, 'pubDate'); //Tue, 02-11-2004 02:10:49 GMT //Thu, 14 Oct 2004 08:21:00 GMT $Section = substr($pubDate[0], 5); $day = substr($Section, 0, 2); $mth = substr($Section, 3, 2); $match = preg_replace("/.[0-9]/", "match", $mth); if( $match == "match"){ $yrs = substr($Section, 6, 4); $hrs = substr($Section, 11, 2); $min = substr($Section, 14, 2); $sec = substr($Section, 17, 2); $newpubDate = mktime( $hrs, $min, $sec, $mth, $day, $yrs, 1); $newpubDate = date("D, j M \a\\t g:i a", $newpubDate); } else { $mth = substr($Section, 3, 3); $mth = strtotime("10 $mth 2000"); $mth = date("m",$mth); $yrs = substr($Section, 7, 4); $hrs = substr($Section, 12, 2); $min = substr($Section, 15, 2); $sec = substr($Section, 18, 2); $newpubDate = mktime( $hrs, $min, $sec, $mth, $day, $yrs, 1); $newpubDate = date("D, j M \a\\t g:i a", $newpubDate); } BTW excellent script, easy to impliment, and no php_extensions required! XML ID by wasabi (mclellan@adam.com.au) on 14 Jan 2005 3:59am GMT http://xoap.weather.com/weather/local/ASXX0083?cc=*&unit=m&par=1004027791&key=b046c68db097b8d1 How do I read tags with ID numbers Tags with additional content by drewprops (php@drewprops.com) on 23 Jan 2005 3:56am GMT As Wasabi mentions in the post above, tags with content inside of them do in fact exist now. I've encountered one at this feed (http://www.core77.com/corehome/index.xml). Specifically they embed some RDF data in the XML tag for Item. The item tag reads like this: <item rdf:about="http://www.example.com/stories/001.html"> The easiest answer would seem that I should write a custom function JUST to handle the parsing of the $xml variable to extract the information into the $items array. The problem is, each of the articles listed has a different URL embedded inside its <item> tag... however I don't know how to write wildcards in PHP!! Golly gee whillikers it would sure be neat to know how to handle this.... Tags with additional content (hack) by drewprops (php@drewprops.com) on 23 Jan 2005 7:07am GMT well I "kind of" solved this problem. First off, I made a duplicate of the untag function and gave it a new name (lets call it 'workaround_link'). Turns out that you really don't NEED to include the ENTIRE opening tag for the variable $preg, just the closing carat of the tag. So it could look like this: $preg = "|>(.*?)</$tag>|s" All I know is that it WORKS!!!! (and that I can now rest) |