You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/02/08 10:02:00 UTC

[GitHub] [lucene] gautamworah96 opened a new pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

gautamworah96 opened a new pull request #658:
URL: https://github.com/apache/lucene/pull/658

<!--
_(If you are a project committer then you may remove some/all of the following template.)_

Before creating a pull request, please file an issue in the ASF Jira system for Lucene:

* https://issues.apache.org/jira/projects/LUCENE

You will need to create an account in Jira in order to create an issue.

The title of the PR should reference the Jira issue number in the form:

* LUCENE-####: <short description of problem or changes>

LUCENE must be fully capitalized. A short description helps people scanning pull requests for items they can work on.

Properly referencing the issue in the title ensures that Jira is correctly updated with code review comments and commits. -->

# Description

Implement Weight#count on PointRangeQuery. Also fixed a small style inconsistency that I noticed

Issue: https://issues.apache.org/jira/browse/LUCENE-10378

# Solution

Use a similar approach to what we've done for `TermQuery` or `NormsFieldExistsQuery`. Added initial checks for validating the input and then checking if all documents have at-least one point, the field is single dimensional and the number of points equals the number of documents.

Note: @jpountz I think I've misinterpreted your `by only counting matches on the two leaves that cross with the query.` comment and have implemented a brute force approach.

# Tests

Tests have not been added. I'm in the process of writing them

# Checklist

Please review the following and check all that apply:

- [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code conforms to the standards described there to the best of my ability.
- [x] I have created a Jira issue and added the issue ID to my pull request title.
- [x] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
- [x] I have developed this patch against the `main` branch.
- [x] I have run `./gradlew check`.
- [ ] I have added tests for my changes.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] iverase commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

iverase commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r802572495



##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +376,45 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (checkValidPointValues(values) == false) {
+          return 0;
+        }
+
+        if (reader.hasDeletions() == false
+            && numDims == 1
+            && values.getDocCount() == values.size()) {
+          // if all documents have at-most one point
+          final int[] intersectingLeafNodeCount = {0};
+          // create a custom IntersectVisitor that records the number of leafNodes that matched
+          final IntersectVisitor visitor =
+              new IntersectVisitor() {
+                @Override
+                public void visit(int docID) {
+                  intersectingLeafNodeCount[0]++;

Review comment:
       By the way @jpountz, the method can be call because we call relate using the exact bounds of a leaf where visiting so we might end up calling this method :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] iverase commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

iverase commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r802572495



##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +376,45 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (checkValidPointValues(values) == false) {
+          return 0;
+        }
+
+        if (reader.hasDeletions() == false
+            && numDims == 1
+            && values.getDocCount() == values.size()) {
+          // if all documents have at-most one point
+          final int[] intersectingLeafNodeCount = {0};
+          // create a custom IntersectVisitor that records the number of leafNodes that matched
+          final IntersectVisitor visitor =
+              new IntersectVisitor() {
+                @Override
+                public void visit(int docID) {
+                  intersectingLeafNodeCount[0]++;

Review comment:
       By the way @jpountz, the method can be call because we call relate using the exact bounds of a leaf when visiting it so we might end up calling this method :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] gautamworah96 edited a comment on pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

gautamworah96 edited a comment on pull request #658:
URL: https://github.com/apache/lucene/pull/658#issuecomment-1041175108


   @iverase I see you already opened a backport PR. Thanks! I'll approve that as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] iverase commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

iverase commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r802571592



##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +376,45 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (checkValidPointValues(values) == false) {
+          return 0;
+        }
+
+        if (reader.hasDeletions() == false
+            && numDims == 1
+            && values.getDocCount() == values.size()) {
+          // if all documents have at-most one point
+          final int[] intersectingLeafNodeCount = {0};
+          // create a custom IntersectVisitor that records the number of leafNodes that matched
+          final IntersectVisitor visitor =
+              new IntersectVisitor() {
+                @Override
+                public void visit(int docID) {
+                  intersectingLeafNodeCount[0]++;

Review comment:
       I would really prefer all the count logic is in the method PointValues#pointCount. Let's consider the signature I propose in the issue. Then you just need to do the following:
   ```
   return Math.toIntExact(values.pointCount(this::relate, this::matches));
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

jpountz commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r802625846



##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +376,45 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (checkValidPointValues(values) == false) {
+          return 0;
+        }
+
+        if (reader.hasDeletions() == false
+            && numDims == 1
+            && values.getDocCount() == values.size()) {
+          // if all documents have at-most one point
+          final int[] intersectingLeafNodeCount = {0};
+          // create a custom IntersectVisitor that records the number of leafNodes that matched
+          final IntersectVisitor visitor =
+              new IntersectVisitor() {
+                @Override
+                public void visit(int docID) {
+                  intersectingLeafNodeCount[0]++;

Review comment:
       I'm good if we give this method a good signature.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] iverase commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

iverase commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r802659729



##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +376,45 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (checkValidPointValues(values) == false) {
+          return 0;
+        }
+
+        if (reader.hasDeletions() == false
+            && numDims == 1
+            && values.getDocCount() == values.size()) {
+          // if all documents have at-most one point
+          final int[] intersectingLeafNodeCount = {0};
+          // create a custom IntersectVisitor that records the number of leafNodes that matched
+          final IntersectVisitor visitor =
+              new IntersectVisitor() {
+                @Override
+                public void visit(int docID) {
+                  intersectingLeafNodeCount[0]++;

Review comment:
       One benefit of having it as a method of PointValues is that it can be better tested.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] gautamworah96 commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

gautamworah96 commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r802141959



##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +369,54 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (values == null) {
+          // No docs in this segment indexed any points or this field did not contain any points
+          return 0;
+        }
+
+        if (values.getNumIndexDimensions() != numDims) {
+          throw new IllegalArgumentException(
+              "field=\""
+                  + field
+                  + "\" was indexed with numIndexDimensions="
+                  + values.getNumIndexDimensions()
+                  + " but this query has numDims="
+                  + numDims);
+        }
+        if (bytesPerDim != values.getBytesPerDimension()) {
+          throw new IllegalArgumentException(
+              "field=\""
+                  + field
+                  + "\" was indexed with bytesPerDim="
+                  + values.getBytesPerDimension()
+                  + " but this query has bytesPerDim="
+                  + bytesPerDim);
+        }
+
+        if (reader.hasDeletions() == false
+            && numDims == 1
+            && values.getDocCount() == reader.maxDoc()

Review comment:
       It may be possible that some docs don't have point indexed at all and some have more than one. This will still satisfy the numDocs=numPoints check but will not be good for our use case?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] gautamworah96 commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

gautamworah96 commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r805112108



##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +378,100 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (checkValidPointValues(values) == false) {
+          return 0;
+        }
+
+        if (reader.hasDeletions() == false
+            && numDims == 1
+            && values.getDocCount() == values.size()) {
+          // if all documents have at-most one point
+          return (int) pointCount(values.getPointTree(), this::relate, this::matches);
+        }
+        return super.count(context);
+      }
+
+      /**
+       * Finds the number of points matching the provided range conditions. Using this method is
+       * faster than calling {@link PointValues#intersect(IntersectVisitor)} to get the count of
+       * intersecting points. This method does not enforce live documents, therefore it should only
+       * be used when there are no deleted documents.
+       *
+       * @param pointTree start node of the count operation
+       * @param nodeComparator comparator to be used for checking whether the internal node is
+       *     inside the range
+       * @param leafComparator comparator to be used for checking whether the leaf node is inside
+       *     the range
+       * @return count of points that match the range
+       */
+      private long pointCount(
+          PointValues.PointTree pointTree,
+          BiFunction<byte[], byte[], Relation> nodeComparator,
+          Predicate<byte[]> leafComparator)
+          throws IOException {
+        final int[] matchingLeafNodeCount = {0};
+        // create a custom IntersectVisitor that records the number of leafNodes that matched
+        final IntersectVisitor visitor =
+            new IntersectVisitor() {
+              @Override
+              public void visit(int docID) {
+                // this branch should be unreachable
+                throw new UnsupportedOperationException(
+                    "This IntersectVisitor does not perform any actions on a "
+                        + "docID="
+                        + docID
+                        + " node being visited");
+              }
+
+              @Override
+              public void visit(int docID, byte[] packedValue) {
+                if (leafComparator.test(packedValue)) {
+                  matchingLeafNodeCount[0]++;
+                }
+              }
+
+              @Override
+              public Relation compare(byte[] minPackedValue, byte[] maxPackedValue) {
+                return nodeComparator.apply(minPackedValue, maxPackedValue);
+              }
+            };
+        Relation r =

Review comment:
       Silly mistake. Thanks for catching that. Fixed it 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] iverase commented on pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

iverase commented on pull request #658:
URL: https://github.com/apache/lucene/pull/658#issuecomment-1040625361


   I will push this tomorrow if it is ok with you @gautamworah96  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] gautamworah96 commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

gautamworah96 commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r802144345



##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +369,54 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (values == null) {
+          // No docs in this segment indexed any points or this field did not contain any points
+          return 0;
+        }
+
+        if (values.getNumIndexDimensions() != numDims) {
+          throw new IllegalArgumentException(
+              "field=\""
+                  + field
+                  + "\" was indexed with numIndexDimensions="
+                  + values.getNumIndexDimensions()
+                  + " but this query has numDims="
+                  + numDims);
+        }
+        if (bytesPerDim != values.getBytesPerDimension()) {
+          throw new IllegalArgumentException(
+              "field=\""
+                  + field
+                  + "\" was indexed with bytesPerDim="
+                  + values.getBytesPerDimension()
+                  + " but this query has bytesPerDim="
+                  + bytesPerDim);
+        }
+
+        if (reader.hasDeletions() == false
+            && numDims == 1
+            && values.getDocCount() == reader.maxDoc()

Review comment:
       IIUC Let's say out of 10 docs, only 8 have indexed points, (each has one). The second check `values.getDocCount() == values.size()` will still pass, but we need to check whether each doc has a single point at least. The `reader.maxDoc()` check will fail in this case (and is hence required).  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

jpountz commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r801605478



##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +369,54 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (values == null) {
+          // No docs in this segment indexed any points or this field did not contain any points
+          return 0;
+        }
+
+        if (values.getNumIndexDimensions() != numDims) {
+          throw new IllegalArgumentException(
+              "field=\""
+                  + field
+                  + "\" was indexed with numIndexDimensions="
+                  + values.getNumIndexDimensions()
+                  + " but this query has numDims="
+                  + numDims);
+        }
+        if (bytesPerDim != values.getBytesPerDimension()) {
+          throw new IllegalArgumentException(
+              "field=\""
+                  + field
+                  + "\" was indexed with bytesPerDim="
+                  + values.getBytesPerDimension()
+                  + " but this query has bytesPerDim="
+                  + bytesPerDim);
+        }
+
+        if (reader.hasDeletions() == false
+            && numDims == 1
+            && values.getDocCount() == reader.maxDoc()

Review comment:
       I don't think we need this check, do we?

##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +369,54 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (values == null) {
+          // No docs in this segment indexed any points or this field did not contain any points
+          return 0;
+        }
+
+        if (values.getNumIndexDimensions() != numDims) {
+          throw new IllegalArgumentException(
+              "field=\""
+                  + field
+                  + "\" was indexed with numIndexDimensions="
+                  + values.getNumIndexDimensions()
+                  + " but this query has numDims="
+                  + numDims);
+        }
+        if (bytesPerDim != values.getBytesPerDimension()) {
+          throw new IllegalArgumentException(
+              "field=\""
+                  + field
+                  + "\" was indexed with bytesPerDim="
+                  + values.getBytesPerDimension()
+                  + " but this query has bytesPerDim="
+                  + bytesPerDim);
+        }

Review comment:
       Maybe extract this logic to a function to share it across the `scorerSupplier` and `count` methods?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

jpountz commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r802539169



##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +376,45 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (checkValidPointValues(values) == false) {
+          return 0;
+        }
+
+        if (reader.hasDeletions() == false
+            && numDims == 1
+            && values.getDocCount() == values.size()) {
+          // if all documents have at-most one point
+          final int[] intersectingLeafNodeCount = {0};
+          // create a custom IntersectVisitor that records the number of leafNodes that matched
+          final IntersectVisitor visitor =
+              new IntersectVisitor() {
+                @Override
+                public void visit(int docID) {
+                  intersectingLeafNodeCount[0]++;

Review comment:
       Let's throw an UnsupportedOperationException here and move the increment to `visit(int,byte[])`? Tt would be a bug if this method would ever get called since the point is to skip nodes that are contained by the query.

##########
File path: lucene/core/src/java/org/apache/lucene/index/PointValues.java
##########
@@ -369,6 +369,52 @@ private void intersect(IntersectVisitor visitor, PointTree pointTree) throws IOE
     }
   }
 
+  /**
+   * Finds the number of points matching the provided range conditions. Using this method is faster
+   * than calling {@link #intersect(IntersectVisitor)} to get the count of intersecting points. This
+   * method does not enforce live documents, therefore it should only be used when there are no
+   * deleted documents.
+   */
+  public final long countPoints(IntersectVisitor visitor) throws IOException {
+    final PointTree pointTree = getPointTree();
+    long countPoints = countPoints(visitor, pointTree);
+    assert pointTree.moveToParent()
+        == false; // just checking to make sure we ended the tree search at the root node
+    return countPoints;
+  }
+
+  private long countPoints(IntersectVisitor visitor, PointTree pointTree) throws IOException {
+    Relation r = visitor.compare(pointTree.getMinPackedValue(), pointTree.getMaxPackedValue());
+    switch (r) {
+      case CELL_OUTSIDE_QUERY:
+        // This cell is fully outside the query shape: return 0 as the count of its nodes
+        return 0;
+      case CELL_INSIDE_QUERY:
+        // This cell is fully inside the query shape: return the size of the entire node as the
+        // count
+        return pointTree.size();
+      case CELL_CROSSES_QUERY:
+        /*
+        The cell crosses the shape boundary, or the cell fully contains the query, so we fall
+        through and do full counting.
+        */
+        if (pointTree.moveToChild()) {
+          int cellCount = 0;
+          do {
+            cellCount += countPoints(visitor, pointTree);
+          } while (pointTree.moveToSibling());
+          pointTree.moveToParent();
+          return cellCount;
+        } else {
+          // we have reached a leaf node here.
+          pointTree.visitDocValues(visitor);
+          return 0; // the visitor has safely recorded the number of leaf nodes that matched
+        }
+      default:
+        throw new IllegalArgumentException("Unreachable code");
+    }
+  }
+

Review comment:
       I think I'd keep these two methods as implementation details of PointRangeQuery? The contract is a bit weird as the `IntersectVisitor` only collects documents that are on leaves that cross the query.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] iverase commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

iverase commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r803536317



##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +378,100 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (checkValidPointValues(values) == false) {
+          return 0;
+        }
+
+        if (reader.hasDeletions() == false
+            && numDims == 1
+            && values.getDocCount() == values.size()) {
+          // if all documents have at-most one point
+          return (int) pointCount(values.getPointTree(), this::relate, this::matches);
+        }
+        return super.count(context);
+      }
+
+      /**
+       * Finds the number of points matching the provided range conditions. Using this method is
+       * faster than calling {@link PointValues#intersect(IntersectVisitor)} to get the count of
+       * intersecting points. This method does not enforce live documents, therefore it should only
+       * be used when there are no deleted documents.
+       *
+       * @param pointTree start node of the count operation
+       * @param nodeComparator comparator to be used for checking whether the internal node is
+       *     inside the range
+       * @param leafComparator comparator to be used for checking whether the leaf node is inside
+       *     the range
+       * @return count of points that match the range
+       */
+      private long pointCount(
+          PointValues.PointTree pointTree,
+          BiFunction<byte[], byte[], Relation> nodeComparator,
+          Predicate<byte[]> leafComparator)
+          throws IOException {
+        final int[] matchingLeafNodeCount = {0};
+        // create a custom IntersectVisitor that records the number of leafNodes that matched
+        final IntersectVisitor visitor =
+            new IntersectVisitor() {
+              @Override
+              public void visit(int docID) {
+                // this branch should be unreachable
+                throw new UnsupportedOperationException(
+                    "This IntersectVisitor does not perform any actions on a "
+                        + "docID="
+                        + docID
+                        + " node being visited");
+              }
+
+              @Override
+              public void visit(int docID, byte[] packedValue) {
+                if (leafComparator.test(packedValue)) {
+                  matchingLeafNodeCount[0]++;
+                }
+              }
+
+              @Override
+              public Relation compare(byte[] minPackedValue, byte[] maxPackedValue) {
+                return nodeComparator.apply(minPackedValue, maxPackedValue);
+              }
+            };
+        Relation r =

Review comment:
       I ythink we should move the recursive part into its own method and reuse the IntersectVisitor? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] gautamworah96 commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

gautamworah96 commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r802364095



##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +369,54 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (values == null) {
+          // No docs in this segment indexed any points or this field did not contain any points
+          return 0;
+        }
+
+        if (values.getNumIndexDimensions() != numDims) {
+          throw new IllegalArgumentException(
+              "field=\""
+                  + field
+                  + "\" was indexed with numIndexDimensions="
+                  + values.getNumIndexDimensions()
+                  + " but this query has numDims="
+                  + numDims);
+        }
+        if (bytesPerDim != values.getBytesPerDimension()) {
+          throw new IllegalArgumentException(
+              "field=\""
+                  + field
+                  + "\" was indexed with bytesPerDim="
+                  + values.getBytesPerDimension()
+                  + " but this query has bytesPerDim="
+                  + bytesPerDim);
+        }

Review comment:
       Added in [8f2edd4](https://github.com/apache/lucene/pull/658/commits/8f2edd473c9edbafc26fa3852cde8af30108b55c)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] gautamworah96 commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

gautamworah96 commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r802502992



##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +369,54 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (values == null) {
+          // No docs in this segment indexed any points or this field did not contain any points
+          return 0;
+        }
+
+        if (values.getNumIndexDimensions() != numDims) {
+          throw new IllegalArgumentException(
+              "field=\""
+                  + field
+                  + "\" was indexed with numIndexDimensions="
+                  + values.getNumIndexDimensions()
+                  + " but this query has numDims="
+                  + numDims);
+        }
+        if (bytesPerDim != values.getBytesPerDimension()) {
+          throw new IllegalArgumentException(
+              "field=\""
+                  + field
+                  + "\" was indexed with bytesPerDim="
+                  + values.getBytesPerDimension()
+                  + " but this query has bytesPerDim="
+                  + bytesPerDim);
+        }
+
+        if (reader.hasDeletions() == false
+            && numDims == 1
+            && values.getDocCount() == reader.maxDoc()

Review comment:
       Hmm. True. Edited it to say `if all docs have atmost one point`. The BKD tree deals in point space and if we have the guarantee that one doc has at most one point, we can safely return the number of matched points as the number of docs




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] iverase commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

iverase commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r803536317



##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +378,100 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (checkValidPointValues(values) == false) {
+          return 0;
+        }
+
+        if (reader.hasDeletions() == false
+            && numDims == 1
+            && values.getDocCount() == values.size()) {
+          // if all documents have at-most one point
+          return (int) pointCount(values.getPointTree(), this::relate, this::matches);
+        }
+        return super.count(context);
+      }
+
+      /**
+       * Finds the number of points matching the provided range conditions. Using this method is
+       * faster than calling {@link PointValues#intersect(IntersectVisitor)} to get the count of
+       * intersecting points. This method does not enforce live documents, therefore it should only
+       * be used when there are no deleted documents.
+       *
+       * @param pointTree start node of the count operation
+       * @param nodeComparator comparator to be used for checking whether the internal node is
+       *     inside the range
+       * @param leafComparator comparator to be used for checking whether the leaf node is inside
+       *     the range
+       * @return count of points that match the range
+       */
+      private long pointCount(
+          PointValues.PointTree pointTree,
+          BiFunction<byte[], byte[], Relation> nodeComparator,
+          Predicate<byte[]> leafComparator)
+          throws IOException {
+        final int[] matchingLeafNodeCount = {0};
+        // create a custom IntersectVisitor that records the number of leafNodes that matched
+        final IntersectVisitor visitor =
+            new IntersectVisitor() {
+              @Override
+              public void visit(int docID) {
+                // this branch should be unreachable
+                throw new UnsupportedOperationException(
+                    "This IntersectVisitor does not perform any actions on a "
+                        + "docID="
+                        + docID
+                        + " node being visited");
+              }
+
+              @Override
+              public void visit(int docID, byte[] packedValue) {
+                if (leafComparator.test(packedValue)) {
+                  matchingLeafNodeCount[0]++;
+                }
+              }
+
+              @Override
+              public Relation compare(byte[] minPackedValue, byte[] maxPackedValue) {
+                return nodeComparator.apply(minPackedValue, maxPackedValue);
+              }
+            };
+        Relation r =

Review comment:
       I think we should move the recursive part into its own method and reuse the IntersectVisitor? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

jpountz commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r802363166



##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +369,54 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (values == null) {
+          // No docs in this segment indexed any points or this field did not contain any points
+          return 0;
+        }
+
+        if (values.getNumIndexDimensions() != numDims) {
+          throw new IllegalArgumentException(
+              "field=\""
+                  + field
+                  + "\" was indexed with numIndexDimensions="
+                  + values.getNumIndexDimensions()
+                  + " but this query has numDims="
+                  + numDims);
+        }
+        if (bytesPerDim != values.getBytesPerDimension()) {
+          throw new IllegalArgumentException(
+              "field=\""
+                  + field
+                  + "\" was indexed with bytesPerDim="
+                  + values.getBytesPerDimension()
+                  + " but this query has bytesPerDim="
+                  + bytesPerDim);
+        }
+
+        if (reader.hasDeletions() == false
+            && numDims == 1
+            && values.getDocCount() == reader.maxDoc()

Review comment:
       We do not need all documents to have points. In your example where there are 10 docs and 8 of them have points. If the range query contains the range of value (ie. all points match), then the BKD tree would give us 8 matches. So we're good?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] gautamworah96 commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

gautamworah96 commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r802141959



##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +369,54 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (values == null) {
+          // No docs in this segment indexed any points or this field did not contain any points
+          return 0;
+        }
+
+        if (values.getNumIndexDimensions() != numDims) {
+          throw new IllegalArgumentException(
+              "field=\""
+                  + field
+                  + "\" was indexed with numIndexDimensions="
+                  + values.getNumIndexDimensions()
+                  + " but this query has numDims="
+                  + numDims);
+        }
+        if (bytesPerDim != values.getBytesPerDimension()) {
+          throw new IllegalArgumentException(
+              "field=\""
+                  + field
+                  + "\" was indexed with bytesPerDim="
+                  + values.getBytesPerDimension()
+                  + " but this query has bytesPerDim="
+                  + bytesPerDim);
+        }
+
+        if (reader.hasDeletions() == false
+            && numDims == 1
+            && values.getDocCount() == reader.maxDoc()

Review comment:
       It may be possible that some docs don't have point indexed at all and some have more than one. This will still satisfy the numDocs=numPoints check but will not be good for our use case




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] gautamworah96 commented on pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

gautamworah96 commented on pull request #658:
URL: https://github.com/apache/lucene/pull/658#issuecomment-1040653243


   Yes. I was wondering why you had kept it open :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] gautamworah96 commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

gautamworah96 commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r804116906



##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +378,100 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (checkValidPointValues(values) == false) {
+          return 0;
+        }
+
+        if (reader.hasDeletions() == false
+            && numDims == 1
+            && values.getDocCount() == values.size()) {
+          // if all documents have at-most one point
+          return (int) pointCount(values.getPointTree(), this::relate, this::matches);
+        }
+        return super.count(context);
+      }
+
+      /**
+       * Finds the number of points matching the provided range conditions. Using this method is
+       * faster than calling {@link PointValues#intersect(IntersectVisitor)} to get the count of
+       * intersecting points. This method does not enforce live documents, therefore it should only
+       * be used when there are no deleted documents.
+       *
+       * @param pointTree start node of the count operation
+       * @param nodeComparator comparator to be used for checking whether the internal node is
+       *     inside the range
+       * @param leafComparator comparator to be used for checking whether the leaf node is inside
+       *     the range
+       * @return count of points that match the range
+       */
+      private long pointCount(
+          PointValues.PointTree pointTree,
+          BiFunction<byte[], byte[], Relation> nodeComparator,
+          Predicate<byte[]> leafComparator)
+          throws IOException {
+        final int[] matchingLeafNodeCount = {0};
+        // create a custom IntersectVisitor that records the number of leafNodes that matched
+        final IntersectVisitor visitor =
+            new IntersectVisitor() {
+              @Override
+              public void visit(int docID) {
+                // this branch should be unreachable
+                throw new UnsupportedOperationException(
+                    "This IntersectVisitor does not perform any actions on a "
+                        + "docID="
+                        + docID
+                        + " node being visited");
+              }
+
+              @Override
+              public void visit(int docID, byte[] packedValue) {
+                if (leafComparator.test(packedValue)) {
+                  matchingLeafNodeCount[0]++;
+                }
+              }
+
+              @Override
+              public Relation compare(byte[] minPackedValue, byte[] maxPackedValue) {
+                return nodeComparator.apply(minPackedValue, maxPackedValue);
+              }
+            };
+        Relation r =

Review comment:
       I've implemented a method signature that I thought would be simpler to understand. It restricts all increment/counting operations to the `matchingNodeCount` array. The second `pointCount` function just returns `void`.
   
   IMO, The other slightly complex approach to do this resulted in a method signature like 
   ```
         private long pointCount(
             IntersectVisitor visitor,
             PointValues.PointTree pointTree,
             BiFunction<byte[], byte[], Relation> nodeComparator,
             Predicate<byte[]> leafComparator,
             int[] matchingLeafNodeCount)
   ```
   A [branch](https://github.com/gautamworah96/lucene/commit/fe937df49def4dc3cd512fef6c7d39ef53023fb1) that implements this method signature and adds matchingLeafNodeCount[0] to the final count.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

jpountz commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r802363166



##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +369,54 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (values == null) {
+          // No docs in this segment indexed any points or this field did not contain any points
+          return 0;
+        }
+
+        if (values.getNumIndexDimensions() != numDims) {
+          throw new IllegalArgumentException(
+              "field=\""
+                  + field
+                  + "\" was indexed with numIndexDimensions="
+                  + values.getNumIndexDimensions()
+                  + " but this query has numDims="
+                  + numDims);
+        }
+        if (bytesPerDim != values.getBytesPerDimension()) {
+          throw new IllegalArgumentException(
+              "field=\""
+                  + field
+                  + "\" was indexed with bytesPerDim="
+                  + values.getBytesPerDimension()
+                  + " but this query has bytesPerDim="
+                  + bytesPerDim);
+        }
+
+        if (reader.hasDeletions() == false
+            && numDims == 1
+            && values.getDocCount() == reader.maxDoc()

Review comment:
       We do not need all documents to have points. In your example where there are 10 docs and 8 of them have points. If the range query contains the range of values (ie. all points match), then the BKD tree would give us 8 matches. So we're good?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] gautamworah96 commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

gautamworah96 commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r802144345



##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +369,54 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (values == null) {
+          // No docs in this segment indexed any points or this field did not contain any points
+          return 0;
+        }
+
+        if (values.getNumIndexDimensions() != numDims) {
+          throw new IllegalArgumentException(
+              "field=\""
+                  + field
+                  + "\" was indexed with numIndexDimensions="
+                  + values.getNumIndexDimensions()
+                  + " but this query has numDims="
+                  + numDims);
+        }
+        if (bytesPerDim != values.getBytesPerDimension()) {
+          throw new IllegalArgumentException(
+              "field=\""
+                  + field
+                  + "\" was indexed with bytesPerDim="
+                  + values.getBytesPerDimension()
+                  + " but this query has bytesPerDim="
+                  + bytesPerDim);
+        }
+
+        if (reader.hasDeletions() == false
+            && numDims == 1
+            && values.getDocCount() == reader.maxDoc()

Review comment:
       IIUC Let's say out of 10 docs, only 8 have indexed points, (each has one). The second check `values.getDocCount() == values.size()` will still pass, but we need to check whether each doc has a single point at least. The reader.maxDoc() check will fail in this case (and is hence required).  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] iverase commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

iverase commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r802571592



##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +376,45 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (checkValidPointValues(values) == false) {
+          return 0;
+        }
+
+        if (reader.hasDeletions() == false
+            && numDims == 1
+            && values.getDocCount() == values.size()) {
+          // if all documents have at-most one point
+          final int[] intersectingLeafNodeCount = {0};
+          // create a custom IntersectVisitor that records the number of leafNodes that matched
+          final IntersectVisitor visitor =
+              new IntersectVisitor() {
+                @Override
+                public void visit(int docID) {
+                  intersectingLeafNodeCount[0]++;

Review comment:
       I would really prefer all the count logic is in the method PointValues#pointCount. Let's consider the signature I propose in the issue. Then you just need to do the following:
   ```
   return Math.toIntExact(values.pointCount(this::relate, this::matches));
   ```
   The method PointValues#pointCount can then reconstruct IntersectVisitor as you are doing here.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

jpountz commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r802367218



##########
File path: lucene/core/src/java/org/apache/lucene/index/PointValues.java
##########
@@ -369,6 +369,51 @@ private void intersect(IntersectVisitor visitor, PointTree pointTree) throws IOE
     }
   }
 
+  /**
+   * Finds the number of points matching the provided range conditions. Using this method is faster
+   * than calling {@link #intersect(IntersectVisitor)} to get the count of intersecting points. This
+   * method does not enforce live documents, therefore it should only be used when there are no
+   * deleted documents.
+   */
+  public final long countPoints(IntersectVisitor visitor) throws IOException {
+    final PointTree pointTree = getPointTree();
+    long countPoints = countPoints(visitor, pointTree);
+    assert pointTree.moveToParent()
+        == false; // just checking to make sure we ended the tree search at the root node
+    return countPoints;
+  }
+
+  private long countPoints(IntersectVisitor visitor, PointTree pointTree) throws IOException {
+    Relation r = visitor.compare(pointTree.getMinPackedValue(), pointTree.getMaxPackedValue());
+    switch (r) {
+      case CELL_OUTSIDE_QUERY:
+        // This cell is fully outside the query shape: return 0 as the count of its nodes
+        return 0;
+      case CELL_INSIDE_QUERY:
+        // This cell is fully inside the query shape: return the size of the entire node as the
+        // count
+        return pointTree.size();
+      case CELL_CROSSES_QUERY:
+        /*
+        The cell crosses the shape boundary, or the cell fully contains the query, so we fall
+        through and do full counting.
+        */
+        if (pointTree.moveToChild()) {
+          int cellCount = 0;
+          do {
+            cellCount += countPoints(visitor, pointTree);
+          } while (pointTree.moveToSibling());
+          pointTree.moveToParent();
+          return cellCount;
+        } else {
+          // we have reached a leaf node here.
+          // add logic here/figure out what to do

Review comment:
       right, here you need to call `PointTree#visitDocValues` to check whether values in this leaf match the query or not




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] iverase merged pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

iverase merged pull request #658:
URL: https://github.com/apache/lucene/pull/658


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] gautamworah96 commented on pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

gautamworah96 commented on pull request #658:
URL: https://github.com/apache/lucene/pull/658#issuecomment-1041175108


   I see you already opened a backport PR. Thanks! I'll approve that as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] gautamworah96 commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

gautamworah96 commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r803485066



##########
File path: lucene/core/src/java/org/apache/lucene/index/PointValues.java
##########
@@ -369,6 +369,52 @@ private void intersect(IntersectVisitor visitor, PointTree pointTree) throws IOE
     }
   }
 
+  /**
+   * Finds the number of points matching the provided range conditions. Using this method is faster
+   * than calling {@link #intersect(IntersectVisitor)} to get the count of intersecting points. This
+   * method does not enforce live documents, therefore it should only be used when there are no
+   * deleted documents.
+   */
+  public final long countPoints(IntersectVisitor visitor) throws IOException {
+    final PointTree pointTree = getPointTree();
+    long countPoints = countPoints(visitor, pointTree);
+    assert pointTree.moveToParent()
+        == false; // just checking to make sure we ended the tree search at the root node
+    return countPoints;
+  }
+
+  private long countPoints(IntersectVisitor visitor, PointTree pointTree) throws IOException {
+    Relation r = visitor.compare(pointTree.getMinPackedValue(), pointTree.getMaxPackedValue());
+    switch (r) {
+      case CELL_OUTSIDE_QUERY:
+        // This cell is fully outside the query shape: return 0 as the count of its nodes
+        return 0;
+      case CELL_INSIDE_QUERY:
+        // This cell is fully inside the query shape: return the size of the entire node as the
+        // count
+        return pointTree.size();
+      case CELL_CROSSES_QUERY:
+        /*
+        The cell crosses the shape boundary, or the cell fully contains the query, so we fall
+        through and do full counting.
+        */
+        if (pointTree.moveToChild()) {
+          int cellCount = 0;
+          do {
+            cellCount += countPoints(visitor, pointTree);
+          } while (pointTree.moveToSibling());
+          pointTree.moveToParent();
+          return cellCount;
+        } else {
+          // we have reached a leaf node here.
+          pointTree.visitDocValues(visitor);
+          return 0; // the visitor has safely recorded the number of leaf nodes that matched
+        }
+      default:
+        throw new IllegalArgumentException("Unreachable code");
+    }
+  }
+

Review comment:
       Got it. Makes sense. This implementation is only dealing with query specific loopholes. `PointValues` has nothing to do with these query level optimizations. Fixed in the next commit




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] gautamworah96 commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

gautamworah96 commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r803485794



##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +376,45 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (checkValidPointValues(values) == false) {
+          return 0;
+        }
+
+        if (reader.hasDeletions() == false
+            && numDims == 1
+            && values.getDocCount() == values.size()) {
+          // if all documents have at-most one point
+          final int[] intersectingLeafNodeCount = {0};
+          // create a custom IntersectVisitor that records the number of leafNodes that matched
+          final IntersectVisitor visitor =
+              new IntersectVisitor() {
+                @Override
+                public void visit(int docID) {
+                  intersectingLeafNodeCount[0]++;

Review comment:
       Done. Thanks for the idea @iverase. Looks much cleaner now (+ removes the inconsistency of adding the leaf node count separately).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] iverase commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

iverase commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r804415672



##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +378,100 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (checkValidPointValues(values) == false) {
+          return 0;
+        }
+
+        if (reader.hasDeletions() == false
+            && numDims == 1
+            && values.getDocCount() == values.size()) {
+          // if all documents have at-most one point
+          return (int) pointCount(values.getPointTree(), this::relate, this::matches);
+        }
+        return super.count(context);
+      }
+
+      /**
+       * Finds the number of points matching the provided range conditions. Using this method is
+       * faster than calling {@link PointValues#intersect(IntersectVisitor)} to get the count of
+       * intersecting points. This method does not enforce live documents, therefore it should only
+       * be used when there are no deleted documents.
+       *
+       * @param pointTree start node of the count operation
+       * @param nodeComparator comparator to be used for checking whether the internal node is
+       *     inside the range
+       * @param leafComparator comparator to be used for checking whether the leaf node is inside
+       *     the range
+       * @return count of points that match the range
+       */
+      private long pointCount(
+          PointValues.PointTree pointTree,
+          BiFunction<byte[], byte[], Relation> nodeComparator,
+          Predicate<byte[]> leafComparator)
+          throws IOException {
+        final int[] matchingLeafNodeCount = {0};
+        // create a custom IntersectVisitor that records the number of leafNodes that matched
+        final IntersectVisitor visitor =
+            new IntersectVisitor() {
+              @Override
+              public void visit(int docID) {
+                // this branch should be unreachable
+                throw new UnsupportedOperationException(
+                    "This IntersectVisitor does not perform any actions on a "
+                        + "docID="
+                        + docID
+                        + " node being visited");
+              }
+
+              @Override
+              public void visit(int docID, byte[] packedValue) {
+                if (leafComparator.test(packedValue)) {
+                  matchingLeafNodeCount[0]++;
+                }
+              }
+
+              @Override
+              public Relation compare(byte[] minPackedValue, byte[] maxPackedValue) {
+                return nodeComparator.apply(minPackedValue, maxPackedValue);
+              }
+            };
+        Relation r =

Review comment:
       That is correct but why are you passing the `nodeComparator` and the `leafComparator` here? there  not needed anymore as they are part of the IntersectVisitor,




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] iverase commented on pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Posted by GitBox <gi...@apache.org>.

iverase commented on pull request #658:
URL: https://github.com/apache/lucene/pull/658#issuecomment-1040625361


   I will push this tomorrow if it is ok with you @gautamworah96  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org