You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Marko Bauhardt <mb...@media-style.com> on 2006/05/17 11:18:36 UTC
Query Boosting
Hi,
i used nutch-0.8 revision 405566.
i have a question about the query-basic plugin. A simple way to test
the parsed Query is the main application
org.apache.nutch.searcher.Query.
If i execute Query and type the query "foo" i get the following output.
Translated: +(url:foo^0.0 anchor:foo^0.0 content:foo title:foo^0.0
host:foo^0.0)
That output means that the boosting is 0.0 for every field. But what
is with the default boosting from the nutch-default.xml? This
configuration is not used.
If i change the BasicQueryFilter i get the output with the boosting
values from the nutch-default.xml.
Translated: +(url:foo^4.0 anchor:foo^2.0 content:foo title:foo^1.5
host:foo^2.0)
I think the FIELD_BOOSTS must be re-initalized if a new boosting is set.
Here is a simpe patch to verify what i mean:
Index: src/plugin/query-basic/src/java/org/apache/nutch/searcher/
basic/BasicQueryFilter.java
===================================================================
--- src/plugin/query-basic/src/java/org/apache/nutch/searcher/basic/
BasicQueryFilter.java (revision 405566)
+++ src/plugin/query-basic/src/java/org/apache/nutch/searcher/basic/
BasicQueryFilter.java (working copy)
@@ -48,7 +48,7 @@
private static final String[] FIELDS =
{ "url", "anchor", "content", "title", "host" };
- private final float[] FIELD_BOOSTS =
+ private float[] FIELD_BOOSTS =
{ URL_BOOST, ANCHOR_BOOST, 1.0f, TITLE_BOOST, HOST_BOOST };
/**
@@ -178,6 +178,7 @@
this.TITLE_BOOST = conf.getFloat("query.title.boost", 1.5f);
this.HOST_BOOST = conf.getFloat("query.host.boost", 2.0f);
this.PHRASE_BOOST = conf.getFloat("query.phrase.boost", 1.0f);
+ FIELD_BOOSTS = new float[]{ URL_BOOST, ANCHOR_BOOST, 1.0f,
TITLE_BOOST, HOST_BOOST };
}
public Configuration getConf() {
Or do i overllook something?
Marko