You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/01/16 14:27:20 UTC

[GitHub] [lucene-solr] jpountz commented on a change in pull request #2205: LUCENE-9668: Deprecate MinShouldMatchSumScorer with WANDScorer

jpountz commented on a change in pull request #2205:
URL: https://github.com/apache/lucene-solr/pull/2205#discussion_r558902897



##########
File path: lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java
##########
@@ -231,10 +231,27 @@ private Scorer opt(
         optionalScorers.add(scorer.get(leadCost));
       }
 
-      if (scoreMode == ScoreMode.TOP_SCORES) {
-        return new WANDScorer(weight, optionalScorers, minShouldMatch);
-      } else if (minShouldMatch > 1) {
-        return new MinShouldMatchSumScorer(weight, optionalScorers, minShouldMatch);
+      // nocommit
+      //
+      // The following updated condition follows the previous one that also utilized
+      // MinShouldMatchSumScorer However, technically speaking, WANDScorer is also able to
+      // handle the case where minShouldMatch == 0 (proven with existing possible condition of
+      // scoreMode == SoreMode.TOP_SCORES && minShouldMatch == 0), but removing the
+      // minShouldMatch > 1 condition would also stop WANDScorer to handle the case
+      // where scoreMode.needsScore() == false, which WANDScorer is also capable of
+      // (proven with existing possible condition of scoreMode != ScoreMode.TOP_SCORES &&
+      // minShouldMatch > 1).
+      //
+      // Ultimately, WANDScorer should be able to handle the following conditions now:
+      // 1. Any ScoreMode (with scoring or not), although it might be a bit slower compared to
+      //   DisjunctionSumScorer due to data structures and algorithms usd
+      // 2. Any minCompetitiveScore ( >= 0 )
+      // 3. Any minShouldMatch ( >= 0 )
+      //
+      // So it seems we might need a different condition check to differentiate usage between
+      // WANDScorer and DisjunctionSumScorer ?

Review comment:
       The current condition looks right to me. Maybe the comment could say something like `WANDScorer can handle all cases but we specialize exhaustive pure disjunctions with DisjunctionSumScorer which is a bit faster`?

##########
File path: lucene/core/src/java/org/apache/lucene/search/DisjunctionMaxScorer.java
##########
@@ -50,7 +50,7 @@
     if (tieBreakerMultiplier < 0 || tieBreakerMultiplier > 1) {
       throw new IllegalArgumentException("tieBreakerMultiplier must be in [0, 1]");
     }
-    if (scoreMode == ScoreMode.TOP_SCORES) {
+    if (scoreMode.needsScores()) {

Review comment:
       why do we need to change this?

##########
File path: lucene/core/src/java/org/apache/lucene/search/WANDScorer.java
##########
@@ -319,7 +356,10 @@ private void pushBackLeads(int target) throws IOException {
       }
     }
     lead = null;
-    leadMaxScore = 0;
+
+    if (needsScore) {
+      leadMaxScore = 0;
+    }

Review comment:
       I wonder if this condition could be removed in order to make this class easier to understand by minimizing differences in code paths between the scoring and the non-scoring cases? Are there other branches we could remove?

##########
File path: lucene/core/src/java/org/apache/lucene/search/WANDScorer.java
##########
@@ -149,23 +167,33 @@ private static long scaleMinScore(float minScore, int scalingFactor) {
     this.doc = -1;
     this.upTo = -1; // will be computed on the first call to nextDoc/advance
 
+    this.needsScore = needsScore;
+
     head = new DisiPriorityQueue(scorers.size());
     // there can be at most num_scorers - 1 scorers beyond the current position
     tail = new DisiWrapper[scorers.size()];
 
-    OptionalInt scalingFactor = OptionalInt.empty();
-    for (Scorer scorer : scorers) {
-      scorer.advanceShallow(0);
-      float maxScore = scorer.getMaxScore(DocIdSetIterator.NO_MORE_DOCS);
-      if (maxScore != 0 && Float.isFinite(maxScore)) {
-        // 0 and +Infty should not impact the scale
-        scalingFactor =
-            OptionalInt.of(
-                Math.min(scalingFactor.orElse(Integer.MAX_VALUE), scalingFactor(maxScore)));
+    if (needsScore) {

Review comment:
       likewise here I would have expected to check `scoreMode == TOP_SCORES` rather than `scoreMode.needsScores()` since calling `advanceShallow` `getMaxScore` only really makes sense if you plan to skip documents based on scores?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org