You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/09/29 18:35:48 UTC

[GitHub] [lucene] dnhatn opened a new pull request #331: LUCENE-10126: Re-introduce chunked bulk scorer in tests

dnhatn opened a new pull request #331:
URL: https://github.com/apache/lucene/pull/331


   This commit re-introduces the chunk scoring logic in AssertingBulkScorer and enables it in TestSortOptimization. 
   I have run 100 iterations over the core tests with this change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dnhatn commented on a change in pull request #331: LUCENE-10126: Re-introduce chunk scoring logic in tests

Posted by GitBox <gi...@apache.org>.
dnhatn commented on a change in pull request #331:
URL: https://github.com/apache/lucene/pull/331#discussion_r719548306



##########
File path: lucene/test-framework/src/java/org/apache/lucene/search/AssertingBulkScorer.java
##########
@@ -80,8 +80,23 @@ public int score(LeafCollector collector, Bits acceptDocs, int min, final int ma
     assert min <= max : "max must be greater than min, got min=" + min + ", and max=" + max;
     this.max = max;
     collector = new AssertingLeafCollector(collector, min, max);
-    final int next = in.score(collector, acceptDocs, min, max);
-    assert next >= max;
+    int next = min;
+    do {
+      final int upTo;
+      if (random.nextBoolean()) {
+        upTo = max;
+      } else {
+        final long interval;
+        if (random.nextInt(100) <= 5) {
+          interval = 1 + random.nextInt(10);
+        } else {
+          interval = 1 + random.nextInt(random.nextBoolean() ? 100 : 5000);
+        }
+        upTo = Math.toIntExact(Math.min(next + interval, max));
+      }
+      next = in.score(collector, acceptDocs, next, upTo);

Review comment:
       Sure, we can assert more. I've added it in https://github.com/apache/lucene/pull/331/commits/3fcc4570e03950cda4de321d0687d55343c5567e




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dnhatn commented on a change in pull request #331: LUCENE-10126: Re-introduce chunk scoring logic in tests

Posted by GitBox <gi...@apache.org>.
dnhatn commented on a change in pull request #331:
URL: https://github.com/apache/lucene/pull/331#discussion_r719709277



##########
File path: lucene/test-framework/src/java/org/apache/lucene/search/AssertingBulkScorer.java
##########
@@ -80,8 +80,23 @@ public int score(LeafCollector collector, Bits acceptDocs, int min, final int ma
     assert min <= max : "max must be greater than min, got min=" + min + ", and max=" + max;
     this.max = max;
     collector = new AssertingLeafCollector(collector, min, max);

Review comment:
       I think we should keep it to make sure sub-ranges do not go backward.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mayya-sharipova commented on a change in pull request #331: LUCENE-10126: Re-introduce chunk scoring logic in tests

Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on a change in pull request #331:
URL: https://github.com/apache/lucene/pull/331#discussion_r719451602



##########
File path: lucene/test-framework/src/java/org/apache/lucene/search/AssertingScorable.java
##########
@@ -45,13 +45,41 @@ public static Scorable wrap(Scorable in) {
     if (in instanceof AssertingScorable) {
       return in;
     }
-    return new AssertingScorable(in);
+    // If `in` is Scorer, we need to wrap it as a Scorer instead of Scorable because
+    // NumericComparator uses the iterator cost of a Scorer in sort optimization.

Review comment:
       Not to be addressed by this PR, but we also have an [issue](https://issues.apache.org/jira/browse/LUCENE-9352) to add cost to `Scorable`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mayya-sharipova commented on a change in pull request #331: LUCENE-10126: Re-introduce chunk scoring logic in tests

Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on a change in pull request #331:
URL: https://github.com/apache/lucene/pull/331#discussion_r719549902



##########
File path: lucene/test-framework/src/java/org/apache/lucene/search/AssertingBulkScorer.java
##########
@@ -80,8 +80,23 @@ public int score(LeafCollector collector, Bits acceptDocs, int min, final int ma
     assert min <= max : "max must be greater than min, got min=" + min + ", and max=" + max;
     this.max = max;
     collector = new AssertingLeafCollector(collector, min, max);

Review comment:
       may be we don't need this line anymore?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dnhatn commented on a change in pull request #331: LUCENE-10126: Re-introduce chunk scoring logic in tests

Posted by GitBox <gi...@apache.org>.
dnhatn commented on a change in pull request #331:
URL: https://github.com/apache/lucene/pull/331#discussion_r719480931



##########
File path: lucene/test-framework/src/java/org/apache/lucene/search/AssertingScorable.java
##########
@@ -45,13 +45,41 @@ public static Scorable wrap(Scorable in) {
     if (in instanceof AssertingScorable) {
       return in;
     }
-    return new AssertingScorable(in);
+    // If `in` is Scorer, we need to wrap it as a Scorer instead of Scorable because
+    // NumericComparator uses the iterator cost of a Scorer in sort optimization.

Review comment:
       That's a nice approach 👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mayya-sharipova commented on a change in pull request #331: LUCENE-10126: Re-introduce chunk scoring logic in tests

Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on a change in pull request #331:
URL: https://github.com/apache/lucene/pull/331#discussion_r719487400



##########
File path: lucene/test-framework/src/java/org/apache/lucene/search/AssertingBulkScorer.java
##########
@@ -80,8 +80,23 @@ public int score(LeafCollector collector, Bits acceptDocs, int min, final int ma
     assert min <= max : "max must be greater than min, got min=" + min + ", and max=" + max;
     this.max = max;
     collector = new AssertingLeafCollector(collector, min, max);
-    final int next = in.score(collector, acceptDocs, min, max);
-    assert next >= max;
+    int next = min;
+    do {
+      final int upTo;
+      if (random.nextBoolean()) {
+        upTo = max;
+      } else {
+        final long interval;
+        if (random.nextInt(100) <= 5) {
+          interval = 1 + random.nextInt(10);
+        } else {
+          interval = 1 + random.nextInt(random.nextBoolean() ? 100 : 5000);
+        }
+        upTo = Math.toIntExact(Math.min(next + interval, max));
+      }
+      next = in.score(collector, acceptDocs, next, upTo);

Review comment:
       What is the reason that here we use `collector` instead of  proper ranges: `AssertingLeafCollector(collector, next, upTo)`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dnhatn merged pull request #331: LUCENE-10126: Re-introduce chunk scoring logic in tests

Posted by GitBox <gi...@apache.org>.
dnhatn merged pull request #331:
URL: https://github.com/apache/lucene/pull/331


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dnhatn commented on pull request #331: LUCENE-10126: Re-introduce chunk scoring logic in tests

Posted by GitBox <gi...@apache.org>.
dnhatn commented on pull request #331:
URL: https://github.com/apache/lucene/pull/331#issuecomment-932256437


   @jpountz @mayya-sharipova Thanks for reviewing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dnhatn commented on pull request #331: LUCENE-10126: Re-introduce chunk scoring logic in tests

Posted by GitBox <gi...@apache.org>.
dnhatn commented on pull request #331:
URL: https://github.com/apache/lucene/pull/331#issuecomment-930702477


   @jpountz Thanks for review + explain. I've addressed your feedback. Can you please take another look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a change in pull request #331: LUCENE-10126: Re-introduce chunk scoring logic in tests

Posted by GitBox <gi...@apache.org>.
jpountz commented on a change in pull request #331:
URL: https://github.com/apache/lucene/pull/331#discussion_r719112805



##########
File path: lucene/test-framework/src/java/org/apache/lucene/search/AssertingScorable.java
##########
@@ -45,13 +45,41 @@ public static Scorable wrap(Scorable in) {
     if (in instanceof AssertingScorable) {
       return in;
     }
-    return new AssertingScorable(in);
+    // If `in` is Scorer, we need to wrap it as a Scorer instead of Scorable because
+    // NumericComparator uses the iterator cost of a Scorer in sort optimization.
+    if (in instanceof Scorer) {
+      return new WrappedScorer((Scorer) in);

Review comment:
       Is there a reason not to use AssertingScorer?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dnhatn commented on a change in pull request #331: LUCENE-10126: Re-introduce chunk scoring logic in tests

Posted by GitBox <gi...@apache.org>.
dnhatn commented on a change in pull request #331:
URL: https://github.com/apache/lucene/pull/331#discussion_r719372028



##########
File path: lucene/test-framework/src/java/org/apache/lucene/search/AssertingScorable.java
##########
@@ -45,13 +45,41 @@ public static Scorable wrap(Scorable in) {
     if (in instanceof AssertingScorable) {
       return in;
     }
-    return new AssertingScorable(in);
+    // If `in` is Scorer, we need to wrap it as a Scorer instead of Scorable because
+    // NumericComparator uses the iterator cost of a Scorer in sort optimization.
+    if (in instanceof Scorer) {
+      return new WrappedScorer((Scorer) in);

Review comment:
       We don't have access to ScoreMode that is [required](https://github.com/apache/lucene/blob/53fd63dbb275d6139be4f5420f43cbd4cf413aab/lucene/test-framework/src/java/org/apache/lucene/search/AssertingScorer.java#L34) by AssertingScorer.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a change in pull request #331: LUCENE-10126: Re-introduce chunk scoring logic in tests

Posted by GitBox <gi...@apache.org>.
jpountz commented on a change in pull request #331:
URL: https://github.com/apache/lucene/pull/331#discussion_r718874053



##########
File path: lucene/core/src/java/org/apache/lucene/search/FilterLeafCollector.java
##########
@@ -42,6 +42,11 @@ public void collect(int doc) throws IOException {
     in.collect(doc);
   }
 
+  @Override
+  public DocIdSetIterator competitiveIterator() throws IOException {
+    return in.competitiveIterator();
+  }

Review comment:
       Our general guideline is that FilterXXX classes should only delegate abstract methods. Can we move this to FilterLeafCollector classes where it makes sense?

##########
File path: lucene/test-framework/src/java/org/apache/lucene/search/AssertingScorable.java
##########
@@ -52,6 +52,7 @@ public static Scorable unwrap(Scorable in) {
     while (true) {
       if (in instanceof AssertingScorable) in = ((AssertingScorable) in).in;
       else if (in instanceof AssertingScorer) in = ((AssertingScorer) in).in;
+      else if (in instanceof FilterScorer) in = ((FilterScorer) in).in;

Review comment:
       this doesn't look correct to me as some FilterScorer impls might be altering scores?

##########
File path: lucene/core/src/java/org/apache/lucene/search/FilterScorer.java
##########
@@ -74,4 +74,9 @@ public final DocIdSetIterator iterator() {
   public final TwoPhaseIterator twoPhaseIterator() {
     return in.twoPhaseIterator();
   }
+
+  @Override
+  public void setMinCompetitiveScore(float minScore) throws IOException {
+    in.setMinCompetitiveScore(minScore);
+  }

Review comment:
       And likewise here can we move this to FilterScorer impls where it makes sense?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a change in pull request #331: LUCENE-10126: Re-introduce chunk scoring logic in tests

Posted by GitBox <gi...@apache.org>.
jpountz commented on a change in pull request #331:
URL: https://github.com/apache/lucene/pull/331#discussion_r718874053



##########
File path: lucene/core/src/java/org/apache/lucene/search/FilterLeafCollector.java
##########
@@ -42,6 +42,11 @@ public void collect(int doc) throws IOException {
     in.collect(doc);
   }
 
+  @Override
+  public DocIdSetIterator competitiveIterator() throws IOException {
+    return in.competitiveIterator();
+  }

Review comment:
       Our general guideline is that FilterXXX classes should only delegate abstract methods. Can we move this to FilterLeafCollector classes where it makes sense?

##########
File path: lucene/test-framework/src/java/org/apache/lucene/search/AssertingScorable.java
##########
@@ -52,6 +52,7 @@ public static Scorable unwrap(Scorable in) {
     while (true) {
       if (in instanceof AssertingScorable) in = ((AssertingScorable) in).in;
       else if (in instanceof AssertingScorer) in = ((AssertingScorer) in).in;
+      else if (in instanceof FilterScorer) in = ((FilterScorer) in).in;

Review comment:
       this doesn't look correct to me as some FilterScorer impls might be altering scores?

##########
File path: lucene/core/src/java/org/apache/lucene/search/FilterScorer.java
##########
@@ -74,4 +74,9 @@ public final DocIdSetIterator iterator() {
   public final TwoPhaseIterator twoPhaseIterator() {
     return in.twoPhaseIterator();
   }
+
+  @Override
+  public void setMinCompetitiveScore(float minScore) throws IOException {
+    in.setMinCompetitiveScore(minScore);
+  }

Review comment:
       And likewise here can we move this to FilterScorer impls where it makes sense?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mayya-sharipova commented on a change in pull request #331: LUCENE-10126: Re-introduce chunk scoring logic in tests

Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on a change in pull request #331:
URL: https://github.com/apache/lucene/pull/331#discussion_r719487400



##########
File path: lucene/test-framework/src/java/org/apache/lucene/search/AssertingBulkScorer.java
##########
@@ -80,8 +80,23 @@ public int score(LeafCollector collector, Bits acceptDocs, int min, final int ma
     assert min <= max : "max must be greater than min, got min=" + min + ", and max=" + max;
     this.max = max;
     collector = new AssertingLeafCollector(collector, min, max);
-    final int next = in.score(collector, acceptDocs, min, max);
-    assert next >= max;
+    int next = min;
+    do {
+      final int upTo;
+      if (random.nextBoolean()) {
+        upTo = max;
+      } else {
+        final long interval;
+        if (random.nextInt(100) <= 5) {
+          interval = 1 + random.nextInt(10);
+        } else {
+          interval = 1 + random.nextInt(random.nextBoolean() ? 100 : 5000);
+        }
+        upTo = Math.toIntExact(Math.min(next + interval, max));
+      }
+      next = in.score(collector, acceptDocs, next, upTo);

Review comment:
       What is the reason that here we use original `collector` instead of  each time a new collector with proper ranges: `AssertingLeafCollector(collector, next, upTo)`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org