You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Matthew W. Bilotti" <mb...@csail.mit.edu> on 2004/05/01 00:15:59 UTC

Help with scoring, coordination factor?

> In my case it works perfectly. As we generate multilingual and semantic
> expansions of the original words of a query, the coordination factor was
> giving lower score to words with a lot of semantic or morphologic 
> variants.
> 

For me, this has not worked.  I have defined a WordQuery class and used it 
to define my disjunctions, but I am finding that the documents I am 
interested in are still suffering rank penalties.

I wanted to try to understand how the scoring was working internally, so 
for each document in my Hits, I printed the score and an Explanation,
when quering on the original forms of each word only (no WordQueries 
used).

The first document returned had a score of 0.592 and an explanation of 
"0.0 = match required".  Can anyone tell me what this means?  The next 39 
documents retrieved have the same explanation, and steadily decreasing 
scores, which makes sense.  The 40th document retrieved, though, has a 
score of 1.0 and the explanation:

0.0 = fieldWeight(contents:invented in 0), product of:
  0.0 = tf(termFreq(contents:invented)=0)
  6.507968 = idf(docFreq=4189)
  0.0390625 = fieldNorm(field=contents, doc=0)

Can anyone help me understand why a document with score 1.0 is retrieved 
directly after a document with score 0.211?  I don't understand the 
explanation.  Why is the term frequency of "invented" 0?  It should be 3; 
I checked the document.  I tried to delve into the code to find out how to 
print all of the components of the score to the screen (especially coord, 
which I am interested in), but I couldn't figure out how to do it.

Any help or hints you can give me would be truly appreciated.

~ Matthew

-- 
matthew w. bilotti
computer science and artificial intelligence laboratory
massachusetts institute of technology


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

PATCH: unused code in BooleanScorer.java

Posted by Paul Elschot <pa...@xs4all.nl>.

Dear developers,

All tests pass here with this change in BooleanScorer.java.
The collectHits method is not used anywhere in the Lucene code
and it's not part of the API defined by Scorer.

Have a nice day,
Paul


Index: BooleanScorer.java
===================================================================
RCS file: /home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/search/BooleanScorer.java,v
retrieving revision 1.7
diff -u -3 -p -r1.7 BooleanScorer.java
--- BooleanScorer.java	29 Mar 2004 22:48:03 -0000	1.7
+++ BooleanScorer.java	2 May 2004 20:17:13 -0000
@@ -151,21 +151,6 @@ final class BooleanScorer extends Scorer
       this.scorer = scorer;
     }
 
-    public final void collectHits(HitCollector results) {
-      final int required = scorer.requiredMask;
-      final int prohibited = scorer.prohibitedMask;
-      final float[] coord = scorer.coordFactors;
-
-      for (Bucket bucket = first; bucket!=null; bucket = bucket.next) {
-	if ((bucket.bits & prohibited) == 0 &&	  // check prohibited
-	    (bucket.bits & required) == required){// check required
-	  results.collect(bucket.doc,		  // add to results
-			  bucket.score * coord[bucket.coord]);
-	}
-      }
-      first = null;				  // reset for next round
-    }
-
     public final int size() { return SIZE; }
 
     public HitCollector newCollector(int mask) {


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org