You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Daniel Naber <lu...@danielnaber.de> on 2007/12/16 17:03:33 UTC

storing meta data in ScoringFilter

Hi,

I'm trying to store a new meta data field in my implementation of 
ScoringFilter, using Nutch 0.9. The meta data is only available after 
parsing, but passScoreAfterParsing doesn't offer write access to the meta 
data (i.e. you can call set methods but the values will not be written).

So I'm trying to use distributeScoreToOutlink(). I thought returning 
an "adjust" value like this should work, but none of the meta fields I add 
every appear in Selector.map() in Generator.java (I patched Generator.java 
to display the meta data for debugging purposes):

adjust = new CrawlDatum();
adjust.setStatus(CrawlDatum.STATUS_LINKED);
MapWritable myMetaData = new MapWritable();
myMetaData.put(new Text("foo"), new FloatWritable(9.9f));
adjust.setMetaData(myMetaData);

What am I doing wrong? Note that setting meta data on the outlink pages in 
distributeScoreToOutlink works fine, but I want to set values for the 
original page.

Regards
 Daniel

-- 
http://www.danielnaber.de

Re: storing meta data in ScoringFilter

Posted by Daniel Naber <lu...@danielnaber.de>.
Zitat von Dennis Kubes <ku...@apache.org>:

> Please explain a little more what you are trying to do.

I'm trying to set up a focused crawler, i.e. I want to decide if a  
link should be followed, depending on the contents of the page in  
which the link appears. This already works, but I also want to store  
the page's score (not the Nutch score, my own score) as a meta datum.  
Note this is not the score of the outgoing links, but of the page the  
links are in.

> If you are
> trying to set meta-data in crawldatum you can call:
>
> crawldatum.getMetaData().put(key, value)

There are two crawldatums in distributeScoreToOutlink, the target one  
is not the one I'm interested in, the other one is null by default and  
I'm trying to use it to attach meta data to the fromUrl page (not the  
toUrl = outgoing page). So I create a new "adjust" with meta data and  
return it, but the meta data never shows up in the crawl db.

Regards
  Daniel



Re: storing meta data in ScoringFilter

Posted by Dennis Kubes <ku...@apache.org>.
Please explain a little more what you are trying to do.  If you are 
trying to set meta-data in crawldatum you can call:

crawldatum.getMetaData().put(key, value)

Dennis Kubes

Daniel Naber wrote:
> Hi,
> 
> I'm trying to store a new meta data field in my implementation of 
> ScoringFilter, using Nutch 0.9. The meta data is only available after 
> parsing, but passScoreAfterParsing doesn't offer write access to the meta 
> data (i.e. you can call set methods but the values will not be written).
> 
> So I'm trying to use distributeScoreToOutlink(). I thought returning 
> an "adjust" value like this should work, but none of the meta fields I add 
> every appear in Selector.map() in Generator.java (I patched Generator.java 
> to display the meta data for debugging purposes):
> 
> adjust = new CrawlDatum();
> adjust.setStatus(CrawlDatum.STATUS_LINKED);
> MapWritable myMetaData = new MapWritable();
> myMetaData.put(new Text("foo"), new FloatWritable(9.9f));
> adjust.setMetaData(myMetaData);
> 
> What am I doing wrong? Note that setting meta data on the outlink pages in 
> distributeScoreToOutlink works fine, but I want to set values for the 
> original page.
> 
> Regards
>  Daniel
>