You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Hector Toll <ht...@cesca.es> on 2008/06/26 13:47:31 UTC
Scoring Formula
Hi,
I am currently using NutchWAX and I'm researching how to optimize
searches by changing some parameters. Specifically I intend to improve
searches by changing weights of the query fields (host, site, url, date,
arcname, type,...). But the problem is that studying the formula that
appears in
"http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/Similarity.html"
I do not understand where or how these changes can affect the final
scoring. I would like to know the usefulness of variables "q.getBoost"
and "t.getBoost" and whether their values can be chosen at convenience.
Nor do I understand the development of the function "norm(t,d)" because
I do not know why does the product of fields with the same name as the
term neither that represents "doc.getBoost", nor "lenghtNorm(field)",
nor "f.getBoost". Is there any relationship between "f.getBoost" and
query field boosts (host, site, url, date, arcname, type,...) ?
Please I need these answers.
Thanks in advance.
Hector.