You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/02/01 15:04:54 UTC

[GitHub] [lucene] javanna opened a new pull request #635: LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery

javanna opened a new pull request #635:
URL: https://github.com/apache/lucene/pull/635

IndexSortSortedNumericDocValuesRangeQuery can count matches by computing the first and last matching doc IDs using binary search. I tried to share the code between the query execution and the newly implemented count method, as duplicating code between the two did not look great otherwise.

I expanded the existing tests by issuing an explicit search as well as an explicit count. The existing test exercised mostly count but now that I have implemented Weight#count we want to exercise both codepath: executing the query as well as the count shortcut.

# Checklist

Please review the following and check all that apply:

- [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code conforms to the standards described there to the best of my ability.
- [x] I have created a Jira issue and added the issue ID to my pull request title.
- [ ] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
- [x] I have developed this patch against the `main` branch.
- [x] I have run `./gradlew check`.
- [x] I have added tests for my changes.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] ryo0301 commented on a change in pull request #635: LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery

Posted by GitBox <gi...@apache.org>.

ryo0301 commented on a change in pull request #635:
URL: https://github.com/apache/lucene/pull/635#discussion_r799300144



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java
##########
@@ -195,7 +211,7 @@ public boolean isCacheable(LeafReaderContext ctx) {
    * {@link DocIdSetIterator} makes sure to wrap the original docvalues to skip over documents with
    * no value.
    */
-  private DocIdSetIterator getDocIdSetIterator(
+  private BoundedDocSetIdIterator getDocIdSetIterator(

Review comment:
       Isn't this a typo?
   BoundedDocSetIdIterator → BoundedDocIdSetIterator




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] vigyasharma commented on a change in pull request #635: LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery

Posted by GitBox <gi...@apache.org>.

vigyasharma commented on a change in pull request #635:
URL: https://github.com/apache/lucene/pull/635#discussion_r796904432



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java
##########
@@ -180,9 +169,36 @@ public boolean isCacheable(LeafReaderContext ctx) {
         // if the fallback query is cacheable.
         return fallbackWeight.isCacheable(ctx);
       }
+
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        BoundedDocSetIdIterator disi = getDocIdSetIteratorOrNull(context);
+        if (disi != null) {
+          return disi.lastDoc - disi.firstDoc;
+        }
+        return super.count(context);

Review comment:
       Thanks for the refactor and code reuse in this change @javanna. I was curious about the fallback return option here - Why didn't we return `fallbackWeight.count()`, instead of `super.count()` ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] javanna commented on a change in pull request #635: LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery

Posted by GitBox <gi...@apache.org>.

javanna commented on a change in pull request #635:
URL: https://github.com/apache/lucene/pull/635#discussion_r796962078



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java
##########
@@ -180,9 +169,36 @@ public boolean isCacheable(LeafReaderContext ctx) {
         // if the fallback query is cacheable.
         return fallbackWeight.isCacheable(ctx);
       }
+
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        BoundedDocSetIdIterator disi = getDocIdSetIteratorOrNull(context);
+        if (disi != null) {
+          return disi.lastDoc - disi.firstDoc;
+        }
+        return super.count(context);

Review comment:
       This is an oversight, thanks for looking and for catching this. I will address this. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #635: LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery

Posted by GitBox <gi...@apache.org>.

jpountz commented on a change in pull request #635:
URL: https://github.com/apache/lucene/pull/635#discussion_r797860208



##########
File path: lucene/CHANGES.txt
##########
@@ -128,6 +128,9 @@ New Features
   based on TotalHitCountCollector that allows users to parallelize counting the
   number of hits. (Luca Cavanna, Adrien Grand)
 
+* LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery
+  to speed up computing the number of hits when possible. (Luca Cavanna, Adrien Grand)

Review comment:
       not sure I deserve having my name on this one :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] jpountz merged pull request #635: LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery

Posted by GitBox <gi...@apache.org>.

jpountz merged pull request #635:
URL: https://github.com/apache/lucene/pull/635


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #635: LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery

Posted by GitBox <gi...@apache.org>.

jpountz commented on a change in pull request #635:
URL: https://github.com/apache/lucene/pull/635#discussion_r799310175



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java
##########
@@ -195,7 +211,7 @@ public boolean isCacheable(LeafReaderContext ctx) {
    * {@link DocIdSetIterator} makes sure to wrap the original docvalues to skip over documents with
    * no value.
    */
-  private DocIdSetIterator getDocIdSetIterator(
+  private BoundedDocSetIdIterator getDocIdSetIterator(

Review comment:
       Yes, I'm pretty sure it is. If you open a PR to rename, I'll merge it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] javanna commented on a change in pull request #635: LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery

Posted by GitBox <gi...@apache.org>.

javanna commented on a change in pull request #635:
URL: https://github.com/apache/lucene/pull/635#discussion_r798385420



##########
File path: lucene/CHANGES.txt
##########
@@ -128,6 +128,9 @@ New Features
   based on TotalHitCountCollector that allows users to parallelize counting the
   number of hits. (Luca Cavanna, Adrien Grand)
 
+* LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery
+  to speed up computing the number of hits when possible. (Luca Cavanna, Adrien Grand)

Review comment:
       eheh, I thought you get the credit because you merge it :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] vigyasharma commented on a change in pull request #635: LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery

Posted by GitBox <gi...@apache.org>.

vigyasharma commented on a change in pull request #635:
URL: https://github.com/apache/lucene/pull/635#discussion_r797005585



##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestIndexSortSortedNumericDocValuesRangeQuery.java
##########
@@ -450,6 +458,56 @@ public void testIndexSortOptimizationDeactivated(RandomIndexWriter writer) throw
     reader.close();
   }
 
+  public void testCount() throws IOException {
+    Directory dir = newDirectory();
+    IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random()));
+    Sort indexSort = new Sort(new SortedNumericSortField("field", SortField.Type.LONG));
+    iwc.setIndexSort(indexSort);
+    RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc);
+    Document doc = new Document();
+    doc.add(new SortedNumericDocValuesField("field", 10));
+    writer.addDocument(doc);
+    IndexReader reader = writer.getReader();
+    IndexSearcher searcher = newSearcher(reader);
+
+    Query fallbackQuery = LongPoint.newRangeQuery("field", 1, 42);
+    Query query = new IndexSortSortedNumericDocValuesRangeQuery("field", 1, 42, fallbackQuery);
+    Weight weight = query.createWeight(searcher, ScoreMode.COMPLETE, 1.0f);
+    for (LeafReaderContext context : searcher.getLeafContexts()) {
+      assertNotEquals(-1, weight.count(context));
+    }
+
+    writer.close();
+    reader.close();
+    dir.close();
+  }
+
+  public void testFallbackCount() throws IOException {
+    Directory dir = newDirectory();
+    IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random()));
+    Sort indexSort = new Sort(new SortedNumericSortField("field", SortField.Type.LONG));
+    iwc.setIndexSort(indexSort);
+    RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc);
+    Document doc = new Document();
+    doc.add(new SortedNumericDocValuesField("field", 10));
+    writer.addDocument(doc);
+    IndexReader reader = writer.getReader();
+    IndexSearcher searcher = newSearcher(reader);
+
+    // we use an unrealistic query that exposes its own Weight#count
+    Query fallbackQuery = new MatchNoDocsQuery();
+    // the index is not sorted on this field, the fallback query is used
+    Query query = new IndexSortSortedNumericDocValuesRangeQuery("another", 1, 42, fallbackQuery);
+    Weight weight = query.createWeight(searcher, ScoreMode.COMPLETE, 1.0f);
+    for (LeafReaderContext context : searcher.getLeafContexts()) {
+      assertNotEquals(-1, weight.count(context));

Review comment:
       Nice, thanks for adding a test for this!
   Minor: Would be good to actually check the fallback weight count here, and in general, have a different assertion here than the one in `testCount()`.. 
   Maybe, `assertEquals(0, weight.count(context));` here, and `assertEquals(1, weight.count(context));` in `testCount()` ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] javanna commented on a change in pull request #635: LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery

Posted by GitBox <gi...@apache.org>.

javanna commented on a change in pull request #635:
URL: https://github.com/apache/lucene/pull/635#discussion_r797016249



##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestIndexSortSortedNumericDocValuesRangeQuery.java
##########
@@ -450,6 +458,56 @@ public void testIndexSortOptimizationDeactivated(RandomIndexWriter writer) throw
     reader.close();
   }
 
+  public void testCount() throws IOException {
+    Directory dir = newDirectory();
+    IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random()));
+    Sort indexSort = new Sort(new SortedNumericSortField("field", SortField.Type.LONG));
+    iwc.setIndexSort(indexSort);
+    RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc);
+    Document doc = new Document();
+    doc.add(new SortedNumericDocValuesField("field", 10));
+    writer.addDocument(doc);
+    IndexReader reader = writer.getReader();
+    IndexSearcher searcher = newSearcher(reader);
+
+    Query fallbackQuery = LongPoint.newRangeQuery("field", 1, 42);
+    Query query = new IndexSortSortedNumericDocValuesRangeQuery("field", 1, 42, fallbackQuery);
+    Weight weight = query.createWeight(searcher, ScoreMode.COMPLETE, 1.0f);
+    for (LeafReaderContext context : searcher.getLeafContexts()) {
+      assertNotEquals(-1, weight.count(context));
+    }
+
+    writer.close();
+    reader.close();
+    dir.close();
+  }
+
+  public void testFallbackCount() throws IOException {
+    Directory dir = newDirectory();
+    IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random()));
+    Sort indexSort = new Sort(new SortedNumericSortField("field", SortField.Type.LONG));
+    iwc.setIndexSort(indexSort);
+    RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc);
+    Document doc = new Document();
+    doc.add(new SortedNumericDocValuesField("field", 10));
+    writer.addDocument(doc);
+    IndexReader reader = writer.getReader();
+    IndexSearcher searcher = newSearcher(reader);
+
+    // we use an unrealistic query that exposes its own Weight#count
+    Query fallbackQuery = new MatchNoDocsQuery();
+    // the index is not sorted on this field, the fallback query is used
+    Query query = new IndexSortSortedNumericDocValuesRangeQuery("another", 1, 42, fallbackQuery);
+    Weight weight = query.createWeight(searcher, ScoreMode.COMPLETE, 1.0f);
+    for (LeafReaderContext context : searcher.getLeafContexts()) {
+      assertNotEquals(-1, weight.count(context));

Review comment:
       you're right. I was initially hesitant on this because I was maybe planning to index more documents, then I could end up with more segments hence asserting on exact count could get more complicated. But if we keep a single doc we should be good and it's good to have a more precise check that also differs for the two scenarios. Thanks for your suggestion!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org