You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by imran khan <im...@gmail.com> on 2013/07/09 10:03:31 UTC

field is always 0.0 in nutch 2.x after custom scoring filter

Greetings,

I have following this tutorial (
http://sujitpal.blogspot.com/2012/01/nutchgora-scoring-and-indexing-plugins.html)
 for
passing on url metadata to Solr index.

And it successfully has passed my url metadata to Solr, but now my <boost>
field is always 0.0. If I remove this plugin from my nutch-site then my
<boost> field has correct value.

I have gone through the source code of this plugin but couldn't find any
code which could be affect the value of <boost> field.

Any suggestions on how to resolve it ?
Or is there anyother mechanism to pass the url metadata ( in seed.txt ) in
nutch2.x ?

Regards,
Khan

Re: field is always 0.0 in nutch 2.x after custom scoring filter

Posted by feng lu <am...@gmail.com>.
Hi
I find the plugin that you show in tutorial include two different nutch
plugin, one the indexing Filter, another is scoring Filter, in scoreing
Filter plugin , I see one method implement from ScoringFilter interface
like this:

@Override public float indexerScore(String url, NutchDocument doc, WebPage
page, float initScore) throws ScoringFilterException { return 0; }

this method will be called when Indexing in IndexUtil class. code like this

    float boost = 1.0f;
    // run scoring filters
    try {
      boost = scoringFilters.indexerScore(url, doc, page, boost);
    } catch (final ScoringFilterException e) {
      LOG.warn("Error calculating score " + key + ": " + e);
      return null;
    }

    doc.setScore(boost);
    // store boost for use by explain and dedup
    doc.add("boost", Float.toString(boost));

so if you load this plugin in nutch. it will already set boost to 0.0. so
one solution is to change this method to fit your needs.
Greetings,

I have following this tutorial (
http://sujitpal.blogspot.com/2012/01/nutchgora-scoring-and-indexing-plugins.html
)
 for
passing on url metadata to Solr index.

And it successfully has passed my url metadata to Solr, but now my <boost>
field is always 0.0. If I remove this plugin from my nutch-site then my
<boost> field has correct value.

I have gone through the source code of this plugin but couldn't find any
code which could be affect the value of <boost> field.

Any suggestions on how to resolve it ?
Or is there anyother mechanism to pass the url metadata ( in seed.txt ) in
nutch2.x ?

Regards,
Khan