You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Karl Wright <da...@yahoo.com> on 2005/05/22 04:59:30 UTC
Possible bug in scoring function for TermQuery?
The following code in the TermWeight subclass of TermQuery seems inconsistent:
public float sumOfSquaredWeights() throws IOException {
idf = getSimilarity(searcher).idf(term, searcher); // compute idf
queryWeight = idf * getBoost(); // compute query weight
return queryWeight * queryWeight; // square it
}
public void normalize(float queryNorm) {
this.queryNorm = queryNorm;
queryWeight *= queryNorm; // normalize query weight
// KDW - extra idf term makes no sense!!!
value = queryWeight * idf; // idf for document
}
The inconsistency comes from the fact that when normalizing for only one term, the weight value should be unity (1.0). In this case, queryNorm as passed into the normalize() method will be sqrt(1/sumOfSquaredWeights()). The extra idf term in the normalize() method seems thus to be superfluous.
I therefore think that the correct code should be:
public float sumOfSquaredWeights() throws IOException {
idf = getSimilarity(searcher).idf(term, searcher); // compute idf
queryWeight = idf * getBoost(); // compute query weight
return queryWeight * queryWeight; // square it
}
public void normalize(float queryNorm) {
this.queryNorm = queryNorm;
queryWeight *= queryNorm; // normalize query weight
// KDW - extra idf term makes no sense; remove it.
// value = queryWeight * idf; // idf for document
value = queryWeight;
}
Karl
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com