You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/01/01 21:23:26 UTC

[GitHub] [lucene-solr] jpountz commented on a change in pull request #2141: LUCENE-9346: Support minimumNumberShouldMatch in WANDScorer

jpountz commented on a change in pull request #2141:
URL: https://github.com/apache/lucene-solr/pull/2141#discussion_r550805393



##########
File path: lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java
##########
@@ -74,11 +74,13 @@ private long computeCost() {
       return minRequiredCost.getAsLong();
     } else {
       final Collection<ScorerSupplier> optionalScorers = subs.get(Occur.SHOULD);
-      final long shouldCost =
-          MinShouldMatchSumScorer.cost(
-              optionalScorers.stream().mapToLong(ScorerSupplier::cost),
-              optionalScorers.size(),
-              minShouldMatch);
+      // nocommit The cost calculation here copies that in WANDScorer's constructor, and may need to be adjusted?
+      final long shouldCost = scoreMode == ScoreMode.TOP_SCORES ?
+                              optionalScorers.stream().mapToLong(ScorerSupplier::cost).sum() :
+                              MinShouldMatchSumScorer.cost(
+                                      optionalScorers.stream().mapToLong(ScorerSupplier::cost),
+                                      optionalScorers.size(),
+                                      minShouldMatch);

Review comment:
       Actually this bit was correct, we should instead fix WANDScorer to take minShouldMatch into account the same way MinShouldMatchSumScorer does.

##########
File path: lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java
##########
@@ -230,10 +232,13 @@ private Scorer opt(
       for (ScorerSupplier scorer : optional) {
         optionalScorers.add(scorer.get(leadCost));
       }
-      if (minShouldMatch > 1) {
+
+      if (scoreMode == ScoreMode.TOP_SCORES) {
+        return new WANDScorer(weight, optionalScorers, minShouldMatch);
+      } else if (minShouldMatch > 1) {
+        // nocommit minShouldMath > 1 && scoreMode != ScoreMode.TOP_SCORES still requires MinShouldMatchSumScorer.
+        // Do we want to deprecate this entirely now ?

Review comment:
       Maybe not now in order to keep this PR contained, but it would be nice if we could handle this case with WANDScorer too indeed.

##########
File path: lucene/core/src/java/org/apache/lucene/search/WANDScorer.java
##########
@@ -447,6 +461,23 @@ private int doNextCompetitiveCandidate() throws IOException {
       }
     }
 
+      // nocommit Do we still need the following given TwoPhaseIterator.matches already performs the check
+//     minCompetitiveScore satisfied, now checks if the doc has enough required number of matches
+//    while (freq < minShouldMatch) {
+//      if (freq + tailSize >= minShouldMatch) {
+//        // a match on doc is still possible, try to
+//        // advance scorers from the tail
+//        advanceTail();

Review comment:
       I don't think we should advanceTail here, which may be costly. We should just make sure that `freq + tailSize >= minShouldMatch` and otherwise call your `else` block.

##########
File path: lucene/core/src/java/org/apache/lucene/search/WANDScorer.java
##########
@@ -130,10 +130,21 @@ private static long scaleMinScore(float minScore, int scalingFactor) {
 
   int upTo; // upper bound for which max scores are valid
 
-  WANDScorer(Weight weight, Collection<Scorer> scorers) throws IOException {
+  int minShouldMatch;
+  int freq;
+
+  WANDScorer(Weight weight, Collection<Scorer> scorers, int minShouldMatch) throws IOException {
     super(weight);
 
+    if (minShouldMatch >= scorers.size()) {
+      throw new IllegalArgumentException("minShouldMatch should be < the number of scorers");
+    }
+
     this.minCompetitiveScore = 0;
+
+    assert minShouldMatch >=0 : "minShouldMatch should not be negative, but got " + minShouldMatch;

Review comment:
       ```suggestion
       assert minShouldMatch >= 0 : "minShouldMatch should not be negative, but got " + minShouldMatch;
   ```

##########
File path: lucene/core/src/java/org/apache/lucene/search/WANDScorer.java
##########
@@ -130,10 +130,21 @@ private static long scaleMinScore(float minScore, int scalingFactor) {
 
   int upTo; // upper bound for which max scores are valid
 
-  WANDScorer(Weight weight, Collection<Scorer> scorers) throws IOException {
+  int minShouldMatch;

Review comment:
       Let's make it final?

##########
File path: lucene/core/src/java/org/apache/lucene/search/WANDScorer.java
##########
@@ -447,6 +461,23 @@ private int doNextCompetitiveCandidate() throws IOException {
       }
     }
 
+      // nocommit Do we still need the following given TwoPhaseIterator.matches already performs the check
+//     minCompetitiveScore satisfied, now checks if the doc has enough required number of matches

Review comment:
       Indeed I don't think we need it, but I like doing it so that the approximation doesn't return documents knowing that no match is possible.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org