You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Hector Toll <ht...@cesca.es> on 2008/06/26 13:47:31 UTC

Scoring Formula

Hi,
I am currently using NutchWAX and I'm researching how to optimize 
searches by changing some parameters. Specifically I intend to improve 
searches by changing weights of the query fields (host, site, url, date, 
arcname, type,...). But the problem is that studying the formula that 
appears in 
"http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/Similarity.html" 
I do not understand where or how these changes can affect the final 
scoring. I would like to know the usefulness of variables "q.getBoost" 
and "t.getBoost" and whether their values can be chosen at convenience. 
Nor do I understand the development of the function "norm(t,d)" because 
I do not know why does the product of fields with the same name as the 
term neither that represents "doc.getBoost", nor "lenghtNorm(field)", 
nor "f.getBoost". Is there any relationship between "f.getBoost" and 
query field boosts (host, site, url, date, arcname, type,...) ?

Please I need these answers.

Thanks in advance.

Hector.