You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Bud Witney <wi...@osu.edu> on 2006/10/27 23:50:35 UTC
Magpie RSS parsing of OpenSearch Format
Does anyone know of a way to get via php and magpie a RSS feed which
contains Opensearch Format tags and info.
Not sure how to get to the Nutch:site info for example.
Any example code would be much appreciated
I did see this on website but don't know how to implement.
_________________________________________________________
"$rss->items is an array of associative arrays, each one describing a
single item. An example that looks like:
<item rdf:about="http://protest.net/NorthEast/calendrome.cgi?
span=event&ID=210257">
<title>Weekly Peace Vigil</title>
<link>http://protest.net/NorthEast/calendrome.cgi?
span=event&ID=210257</link>
<description>Wear a white ribbon</description>
<dc:subject>Peace</dc:subject>
<ev:startdate>2002-06-01T11:00:00</ev:startdate>
<ev:location>Northampton, MA</ev:location>
<ev:enddate>2002-06-01T12:00:00</ev:enddate>
<ev:type>Protest</ev:type>
</item>
Is parsed, and pushed on the $rss->items array as:
array(
title => 'Weekly Peace Vigil',
link => 'http://protest.net/NorthEast/calendrome.cgi?
span=event&ID=210257',
description => 'Wear a white ribbon',
dc => array (
subject => 'Peace'
),
ev => array (
startdate => '2002-06-01T11:00:00',
enddate => '2002-06-01T12:00:00',
type => 'Protest',
location => 'Northampton, MA'
)
);
________________________________________________________"
Bud
Re: Magpie RSS parsing of OpenSearch Format
Posted by Bud Witney <wi...@osu.edu>.
Thanks for the tip
looking it over now. Do I need to use something like this to handle
namespaces
$s->register_ns('nutch', 'http://www.nutch.org/opensearchrss/1.0/');
$s->register_ns('opensearch', 'http://a9.com/-/spec/opensearchrss/
1.0/');
--- I tried this put did not pull data I have to comment out the name
space -- And XPATH part via xsearch is not showing up.
<?php
$contents=file_get_contents('http://140.254.84.215:8080/opensearch?
query=rose&hitsPerSite=2&hitsPerPage=10'); //fetch RSS feed
$s = new SimpleXMLElement($contents);
//$s->register_ns('nutch', 'http://www.nutch.org/opensearchrss/1.0/');
//$s->register_ns('opensearch', 'http://a9.com/-/spec/opensearchrss/
1.0/');
//$s->register_ns('rss', 'http://purl.org/rss/1.0/');
print $s->channel->title . "\n";
foreach ($s->xsearch('//title') as $title) {
print "$title\n";
}
$sites = $s->xsearch('//item/nutch:site');
foreach ($sites as $site) {
print "$site\n";
}
?>
On Oct 30, 2006, at 4:28 AM, Jacob Brunson wrote:
> MagpieRSS is fairly limited in the information it can pull out of a
> feed, so I suspect that it won't be able to pull out the opensearch
> tags. I would suggest if you are using PHP5 to use the SimpleXML
> extensions to get at all the data instead of using Magpie.
>
> On 10/27/06, Bud Witney <wi...@osu.edu> wrote:
>> Does anyone know of a way to get via php and magpie a RSS feed which
>> contains Opensearch Format tags and info.
>>
>> Not sure how to get to the Nutch:site info for example.
>>
>> Any example code would be much appreciated
>>
>> I did see this on website but don't know how to implement.
>> _________________________________________________________
Re: Magpie RSS parsing of OpenSearch Format
Posted by Jacob Brunson <ja...@gmail.com>.
MagpieRSS is fairly limited in the information it can pull out of a
feed, so I suspect that it won't be able to pull out the opensearch
tags. I would suggest if you are using PHP5 to use the SimpleXML
extensions to get at all the data instead of using Magpie.
On 10/27/06, Bud Witney <wi...@osu.edu> wrote:
> Does anyone know of a way to get via php and magpie a RSS feed which
> contains Opensearch Format tags and info.
>
> Not sure how to get to the Nutch:site info for example.
>
> Any example code would be much appreciated
>
> I did see this on website but don't know how to implement.
> _________________________________________________________
> "$rss->items is an array of associative arrays, each one describing a
> single item. An example that looks like:
>
> <item rdf:about="http://protest.net/NorthEast/calendrome.cgi?
> span=event&ID=210257">
> <title>Weekly Peace Vigil</title>
> <link>http://protest.net/NorthEast/calendrome.cgi?
> span=event&ID=210257</link>
> <description>Wear a white ribbon</description>
> <dc:subject>Peace</dc:subject>
> <ev:startdate>2002-06-01T11:00:00</ev:startdate>
> <ev:location>Northampton, MA</ev:location>
> <ev:enddate>2002-06-01T12:00:00</ev:enddate>
> <ev:type>Protest</ev:type>
> </item>
>
> Is parsed, and pushed on the $rss->items array as:
>
> array(
> title => 'Weekly Peace Vigil',
> link => 'http://protest.net/NorthEast/calendrome.cgi?
> span=event&ID=210257',
> description => 'Wear a white ribbon',
> dc => array (
> subject => 'Peace'
> ),
> ev => array (
> startdate => '2002-06-01T11:00:00',
> enddate => '2002-06-01T12:00:00',
> type => 'Protest',
> location => 'Northampton, MA'
> )
> );
>
> ________________________________________________________"
>
> Bud
>
--
http://JacobBrunson.com