You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Daniel Naber <lu...@danielnaber.de> on 2007/12/16 17:03:33 UTC
storing meta data in ScoringFilter
Hi,
I'm trying to store a new meta data field in my implementation of
ScoringFilter, using Nutch 0.9. The meta data is only available after
parsing, but passScoreAfterParsing doesn't offer write access to the meta
data (i.e. you can call set methods but the values will not be written).
So I'm trying to use distributeScoreToOutlink(). I thought returning
an "adjust" value like this should work, but none of the meta fields I add
every appear in Selector.map() in Generator.java (I patched Generator.java
to display the meta data for debugging purposes):
adjust = new CrawlDatum();
adjust.setStatus(CrawlDatum.STATUS_LINKED);
MapWritable myMetaData = new MapWritable();
myMetaData.put(new Text("foo"), new FloatWritable(9.9f));
adjust.setMetaData(myMetaData);
What am I doing wrong? Note that setting meta data on the outlink pages in
distributeScoreToOutlink works fine, but I want to set values for the
original page.
Regards
Daniel
--
http://www.danielnaber.de
Re: storing meta data in ScoringFilter
Posted by Daniel Naber <lu...@danielnaber.de>.
Zitat von Dennis Kubes <ku...@apache.org>:
> Please explain a little more what you are trying to do.
I'm trying to set up a focused crawler, i.e. I want to decide if a
link should be followed, depending on the contents of the page in
which the link appears. This already works, but I also want to store
the page's score (not the Nutch score, my own score) as a meta datum.
Note this is not the score of the outgoing links, but of the page the
links are in.
> If you are
> trying to set meta-data in crawldatum you can call:
>
> crawldatum.getMetaData().put(key, value)
There are two crawldatums in distributeScoreToOutlink, the target one
is not the one I'm interested in, the other one is null by default and
I'm trying to use it to attach meta data to the fromUrl page (not the
toUrl = outgoing page). So I create a new "adjust" with meta data and
return it, but the meta data never shows up in the crawl db.
Regards
Daniel
Re: storing meta data in ScoringFilter
Posted by Dennis Kubes <ku...@apache.org>.
Please explain a little more what you are trying to do. If you are
trying to set meta-data in crawldatum you can call:
crawldatum.getMetaData().put(key, value)
Dennis Kubes
Daniel Naber wrote:
> Hi,
>
> I'm trying to store a new meta data field in my implementation of
> ScoringFilter, using Nutch 0.9. The meta data is only available after
> parsing, but passScoreAfterParsing doesn't offer write access to the meta
> data (i.e. you can call set methods but the values will not be written).
>
> So I'm trying to use distributeScoreToOutlink(). I thought returning
> an "adjust" value like this should work, but none of the meta fields I add
> every appear in Selector.map() in Generator.java (I patched Generator.java
> to display the meta data for debugging purposes):
>
> adjust = new CrawlDatum();
> adjust.setStatus(CrawlDatum.STATUS_LINKED);
> MapWritable myMetaData = new MapWritable();
> myMetaData.put(new Text("foo"), new FloatWritable(9.9f));
> adjust.setMetaData(myMetaData);
>
> What am I doing wrong? Note that setting meta data on the outlink pages in
> distributeScoreToOutlink works fine, but I want to set values for the
> original page.
>
> Regards
> Daniel
>