You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Bud Witney <wi...@osu.edu> on 2006/10/27 23:50:35 UTC

Magpie RSS parsing of OpenSearch Format

Does anyone know of a way to get via php and magpie a RSS feed which  
contains Opensearch Format tags and info.

Not sure how to get to the Nutch:site info for example.

Any example code would be much appreciated

I did see this on website but don't know how to implement.
_________________________________________________________
"$rss->items is an array of associative arrays, each one describing a  
single item. An example that looks like:

<item rdf:about="http://protest.net/NorthEast/calendrome.cgi? 
span=event&ID=210257">
<title>Weekly Peace Vigil</title>
<link>http://protest.net/NorthEast/calendrome.cgi? 
span=event&ID=210257</link>
<description>Wear a white ribbon</description>
<dc:subject>Peace</dc:subject>
<ev:startdate>2002-06-01T11:00:00</ev:startdate>
<ev:location>Northampton, MA</ev:location>
<ev:enddate>2002-06-01T12:00:00</ev:enddate>
<ev:type>Protest</ev:type>
</item>

Is parsed, and pushed on the $rss->items array as:

array(
	title => 'Weekly Peace Vigil',
	link => 'http://protest.net/NorthEast/calendrome.cgi? 
span=event&ID=210257',
	description => 'Wear a white ribbon',
	dc => array (
			subject => 'Peace'
		),
	ev => array (
		startdate => '2002-06-01T11:00:00',
		enddate => '2002-06-01T12:00:00',
		type => 'Protest',
		location => 'Northampton, MA'
	)
);

________________________________________________________"

Bud

Re: Magpie RSS parsing of OpenSearch Format

Posted by Bud Witney <wi...@osu.edu>.
Thanks for the tip

looking it over now. Do I need to use something like this to handle  
namespaces

$s->register_ns('nutch', 'http://www.nutch.org/opensearchrss/1.0/');
$s->register_ns('opensearch', 'http://a9.com/-/spec/opensearchrss/ 
1.0/');

--- I tried this put did not pull data I have to comment out the name  
space -- And XPATH part via xsearch is not showing up.

<?php

$contents=file_get_contents('http://140.254.84.215:8080/opensearch? 
query=rose&hitsPerSite=2&hitsPerPage=10'); //fetch RSS feed

$s = new SimpleXMLElement($contents);
//$s->register_ns('nutch', 'http://www.nutch.org/opensearchrss/1.0/');
//$s->register_ns('opensearch', 'http://a9.com/-/spec/opensearchrss/ 
1.0/');
//$s->register_ns('rss', 'http://purl.org/rss/1.0/');

print $s->channel->title . "\n";
foreach ($s->xsearch('//title') as $title) {
     print "$title\n";
}
$sites = $s->xsearch('//item/nutch:site');

foreach ($sites as $site) {
     print "$site\n";
}

?>







On Oct 30, 2006, at 4:28 AM, Jacob Brunson wrote:

> MagpieRSS is fairly limited in the information it can pull out of a
> feed, so I suspect that it won't be able to pull out the opensearch
> tags.  I would suggest if you are using PHP5 to use the SimpleXML
> extensions to get at all the data instead of using Magpie.
>
> On 10/27/06, Bud Witney <wi...@osu.edu> wrote:
>> Does anyone know of a way to get via php and magpie a RSS feed which
>> contains Opensearch Format tags and info.
>>
>> Not sure how to get to the Nutch:site info for example.
>>
>> Any example code would be much appreciated
>>
>> I did see this on website but don't know how to implement.
>> _________________________________________________________


Re: Magpie RSS parsing of OpenSearch Format

Posted by Jacob Brunson <ja...@gmail.com>.
MagpieRSS is fairly limited in the information it can pull out of a
feed, so I suspect that it won't be able to pull out the opensearch
tags.  I would suggest if you are using PHP5 to use the SimpleXML
extensions to get at all the data instead of using Magpie.

On 10/27/06, Bud Witney <wi...@osu.edu> wrote:
> Does anyone know of a way to get via php and magpie a RSS feed which
> contains Opensearch Format tags and info.
>
> Not sure how to get to the Nutch:site info for example.
>
> Any example code would be much appreciated
>
> I did see this on website but don't know how to implement.
> _________________________________________________________
> "$rss->items is an array of associative arrays, each one describing a
> single item. An example that looks like:
>
> <item rdf:about="http://protest.net/NorthEast/calendrome.cgi?
> span=event&ID=210257">
> <title>Weekly Peace Vigil</title>
> <link>http://protest.net/NorthEast/calendrome.cgi?
> span=event&ID=210257</link>
> <description>Wear a white ribbon</description>
> <dc:subject>Peace</dc:subject>
> <ev:startdate>2002-06-01T11:00:00</ev:startdate>
> <ev:location>Northampton, MA</ev:location>
> <ev:enddate>2002-06-01T12:00:00</ev:enddate>
> <ev:type>Protest</ev:type>
> </item>
>
> Is parsed, and pushed on the $rss->items array as:
>
> array(
>         title => 'Weekly Peace Vigil',
>         link => 'http://protest.net/NorthEast/calendrome.cgi?
> span=event&ID=210257',
>         description => 'Wear a white ribbon',
>         dc => array (
>                         subject => 'Peace'
>                 ),
>         ev => array (
>                 startdate => '2002-06-01T11:00:00',
>                 enddate => '2002-06-01T12:00:00',
>                 type => 'Protest',
>                 location => 'Northampton, MA'
>         )
> );
>
> ________________________________________________________"
>
> Bud
>


-- 
http://JacobBrunson.com