You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Max S <ma...@googlemail.com> on 2009/09/02 22:33:23 UTC

Customise scoring

Hi all,

I'm have installed / imported a XML and EXIF parser plugin into Nutch to
parse xml files and EXIF metadata from JPG images. 

The idea would be to:
1. Fetch and extract data and links from XML file
	NB: The XML file contains Geo coordinates (latitude and longitude),
title and image links. 
2. Fetch image and extract EXIF metadata
3. Store the extracted data from both parser in Index. 

I would like to customise search so the results is ordered by the following
priority.
1. Proximity to location
2. Keywords from EXIF Metadata
3. Kewords from XML title

>From what I can see at the moment, I will need to
1. Set a higher score to the fields according to the priority above
2. Repurpose the algorithm within GeoPosition plugin
(http://wiki.apache.org/nutch/GeoPosition)
3. Update ScoringFilter logic to include Geo Position algorithm?


The question here is, is the last item correct? Or are there any other
approach? 
Where should I start looking? Appreciate any suggestions.

Regards
Max S




RE: Customise scoring

Posted by Max S <ma...@googlemail.com>.
Thanks MilleBii,

That sounds logical. I'll look at query plugin instead. 

Regards
Max S

 

-----Original Message-----
From: MilleBii [mailto:millebii@gmail.com] 
Sent: Thursday, September 03, 2009 8:04 AM
To: nutch-user@lucene.apache.org
Subject: Re: Customise scoring

I think the scoring filter has more to do with crawling and how you would
want to do search in the webgraph (crawldb).

Since you talk about search, you need to write a query plug-in instead that
implements your algorithm and sets the document boost adequately.

Having said that, I vote for having XML/EXIF parser standard in a future
nutch build...



2009/9/2 Max S <ma...@googlemail.com>

> Hi all,
>
> I'm have installed / imported a XML and EXIF parser plugin into Nutch 
> to parse xml files and EXIF metadata from JPG images.
>
> The idea would be to:
> 1. Fetch and extract data and links from XML file
>        NB: The XML file contains Geo coordinates (latitude and 
> longitude), title and image links.
> 2. Fetch image and extract EXIF metadata 3. Store the extracted data 
> from both parser in Index.
>
> I would like to customise search so the results is ordered by the 
> following priority.
> 1. Proximity to location
> 2. Keywords from EXIF Metadata
> 3. Kewords from XML title
>
> From what I can see at the moment, I will need to 1. Set a higher 
> score to the fields according to the priority above 2. Repurpose the 
> algorithm within GeoPosition plugin
> (http://wiki.apache.org/nutch/GeoPosition)
> 3. Update ScoringFilter logic to include Geo Position algorithm?
>
>
> The question here is, is the last item correct? Or are there any other 
> approach?
> Where should I start looking? Appreciate any suggestions.
>
> Regards
> Max S
>
>
>
>


--
-MilleBii-


Re: Customise scoring

Posted by MilleBii <mi...@gmail.com>.
I think the scoring filter has more to do with crawling and how you would
want to do search in the webgraph (crawldb).

Since you talk about search, you need to write a query plug-in instead that
implements your algorithm and sets the document boost adequately.

Having said that, I vote for having XML/EXIF parser standard in a future
nutch build...



2009/9/2 Max S <ma...@googlemail.com>

> Hi all,
>
> I'm have installed / imported a XML and EXIF parser plugin into Nutch to
> parse xml files and EXIF metadata from JPG images.
>
> The idea would be to:
> 1. Fetch and extract data and links from XML file
>        NB: The XML file contains Geo coordinates (latitude and longitude),
> title and image links.
> 2. Fetch image and extract EXIF metadata
> 3. Store the extracted data from both parser in Index.
>
> I would like to customise search so the results is ordered by the following
> priority.
> 1. Proximity to location
> 2. Keywords from EXIF Metadata
> 3. Kewords from XML title
>
> From what I can see at the moment, I will need to
> 1. Set a higher score to the fields according to the priority above
> 2. Repurpose the algorithm within GeoPosition plugin
> (http://wiki.apache.org/nutch/GeoPosition)
> 3. Update ScoringFilter logic to include Geo Position algorithm?
>
>
> The question here is, is the last item correct? Or are there any other
> approach?
> Where should I start looking? Appreciate any suggestions.
>
> Regards
> Max S
>
>
>
>


-- 
-MilleBii-