You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Karl Wright <da...@yahoo.com> on 2005/05/22 04:59:30 UTC

Possible bug in scoring function for TermQuery?

The following code in the TermWeight subclass of TermQuery seems inconsistent:
 
    public float sumOfSquaredWeights() throws IOException {
      idf = getSimilarity(searcher).idf(term, searcher); // compute idf
      queryWeight = idf * getBoost();             // compute query weight
      return queryWeight * queryWeight;           // square it
    }
 
    public void normalize(float queryNorm) {
      this.queryNorm = queryNorm;
      queryWeight *= queryNorm;                   // normalize query weight
      // KDW - extra idf term makes no sense!!!
      value = queryWeight * idf;                  // idf for document 
    }

The inconsistency comes from the fact that when normalizing for only one term, the weight value should be unity (1.0).  In this case, queryNorm as passed into the normalize() method will be sqrt(1/sumOfSquaredWeights()).  The extra idf term in the normalize() method seems thus to be superfluous.
 
I therefore think that the correct code should be:
 
    public float sumOfSquaredWeights() throws IOException {
      idf = getSimilarity(searcher).idf(term, searcher); // compute idf
      queryWeight = idf * getBoost();             // compute query weight
      return queryWeight * queryWeight;           // square it
    }
    public void normalize(float queryNorm) {
      this.queryNorm = queryNorm;
      queryWeight *= queryNorm;                   // normalize query weight
      // KDW - extra idf term makes no sense; remove it.
      // value = queryWeight * idf;                  // idf for document 
      value = queryWeight;
    }

 
Karl

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com