You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by imran khan <im...@gmail.com> on 2013/07/09 10:03:31 UTC
field is always 0.0 in nutch 2.x after custom scoring filter
Greetings,
I have following this tutorial (
http://sujitpal.blogspot.com/2012/01/nutchgora-scoring-and-indexing-plugins.html)
for
passing on url metadata to Solr index.
And it successfully has passed my url metadata to Solr, but now my <boost>
field is always 0.0. If I remove this plugin from my nutch-site then my
<boost> field has correct value.
I have gone through the source code of this plugin but couldn't find any
code which could be affect the value of <boost> field.
Any suggestions on how to resolve it ?
Or is there anyother mechanism to pass the url metadata ( in seed.txt ) in
nutch2.x ?
Regards,
Khan
Re: field is always 0.0 in nutch 2.x after custom scoring filter
Posted by feng lu <am...@gmail.com>.
Hi
I find the plugin that you show in tutorial include two different nutch
plugin, one the indexing Filter, another is scoring Filter, in scoreing
Filter plugin , I see one method implement from ScoringFilter interface
like this:
@Override public float indexerScore(String url, NutchDocument doc, WebPage
page, float initScore) throws ScoringFilterException { return 0; }
this method will be called when Indexing in IndexUtil class. code like this
float boost = 1.0f;
// run scoring filters
try {
boost = scoringFilters.indexerScore(url, doc, page, boost);
} catch (final ScoringFilterException e) {
LOG.warn("Error calculating score " + key + ": " + e);
return null;
}
doc.setScore(boost);
// store boost for use by explain and dedup
doc.add("boost", Float.toString(boost));
so if you load this plugin in nutch. it will already set boost to 0.0. so
one solution is to change this method to fit your needs.
Greetings,
I have following this tutorial (
http://sujitpal.blogspot.com/2012/01/nutchgora-scoring-and-indexing-plugins.html
)
for
passing on url metadata to Solr index.
And it successfully has passed my url metadata to Solr, but now my <boost>
field is always 0.0. If I remove this plugin from my nutch-site then my
<boost> field has correct value.
I have gone through the source code of this plugin but couldn't find any
code which could be affect the value of <boost> field.
Any suggestions on how to resolve it ?
Or is there anyother mechanism to pass the url metadata ( in seed.txt ) in
nutch2.x ?
Regards,
Khan