You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/12/08 21:44:55 UTC

[GitHub] [lucene] gsmiller opened a new pull request, #12003: Some minor code cleanup in IndexSortSortedNumericDocValuesRangeQuery

gsmiller opened a new pull request, #12003:
URL: https://github.com/apache/lucene/pull/12003

   * Leverage DISI static factory methods more over custom DISI impl where possible
   * Assert points field is a single-dim in a couple places
   * Bound cost estimate by the cost of the doc values field (for sparse fields)
   
   ### Description
   This PR contains some minor cleanup I thought might be useful after recently spending a little time looking at this code.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller merged pull request #12003: Some minor code cleanup in IndexSortSortedNumericDocValuesRangeQuery

Posted by GitBox <gi...@apache.org>.
gsmiller merged PR #12003:
URL: https://github.com/apache/lucene/pull/12003


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on a diff in pull request #12003: Some minor code cleanup in IndexSortSortedNumericDocValuesRangeQuery

Posted by GitBox <gi...@apache.org>.
gsmiller commented on code in PR #12003:
URL: https://github.com/apache/lucene/pull/12003#discussion_r1044931897


##########
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java:
##########
@@ -692,7 +697,7 @@ public int advance(int target) throws IOException {
 
     @Override
     public long cost() {
-      return lastDoc - firstDoc;
+      return Math.min(delegate.cost(), lastDoc - firstDoc);

Review Comment:
   Good question. So `delegate.cost()` will approximate the number of documents that could be provided in total by the delegate (in this case, it's `NumericDocValues`). `lastDoc - firstDoc` of course will provide the correct number of docs _if_ every doc has a value. In this case, `delegate` tells us what docs actually have values, and because we're using this, we know not all docs have values, so `lastDoc - firstDoc` _may_ overestimate. If the number of docs containing a value are very sparse though, it's possible we'll over-estimate by a lot, so providing a ceiling using `delegate.cost()` could be useful in certain cases. Hope that makes sense?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on pull request #12003: Some minor code cleanup in IndexSortSortedNumericDocValuesRangeQuery

Posted by GitBox <gi...@apache.org>.
gsmiller commented on PR #12003:
URL: https://github.com/apache/lucene/pull/12003#issuecomment-1345428478

   Thanks @jpountz. Merged/backported.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on a diff in pull request #12003: Some minor code cleanup in IndexSortSortedNumericDocValuesRangeQuery

Posted by GitBox <gi...@apache.org>.
gsmiller commented on code in PR #12003:
URL: https://github.com/apache/lucene/pull/12003#discussion_r1044937729


##########
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java:
##########
@@ -406,117 +406,121 @@ public Relation compare(byte[] minPackedValue, byte[] maxPackedValue) {
 
   private boolean matchNone(PointValues points, byte[] queryLowerPoint, byte[] queryUpperPoint)
       throws IOException {
+    assert points.getNumDimensions() == 1;
     final ByteArrayComparator comparator =
         ArrayUtil.getUnsignedComparator(points.getBytesPerDimension());
-    for (int dim = 0; dim < points.getNumDimensions(); dim++) {
-      int offset = dim * points.getBytesPerDimension();
-      if (comparator.compare(points.getMinPackedValue(), offset, queryUpperPoint, offset) > 0
-          || comparator.compare(points.getMaxPackedValue(), offset, queryLowerPoint, offset) < 0) {
-        return true;
-      }
-    }
-    return false;
+    return comparator.compare(points.getMinPackedValue(), 0, queryUpperPoint, 0) > 0
+        || comparator.compare(points.getMaxPackedValue(), 0, queryLowerPoint, 0) < 0;
   }
 
   private boolean matchAll(PointValues points, byte[] queryLowerPoint, byte[] queryUpperPoint)
       throws IOException {
+    assert points.getNumDimensions() == 1;
     final ByteArrayComparator comparator =
         ArrayUtil.getUnsignedComparator(points.getBytesPerDimension());
-    for (int dim = 0; dim < points.getNumDimensions(); dim++) {
-      int offset = dim * points.getBytesPerDimension();
-      if (comparator.compare(points.getMinPackedValue(), offset, queryLowerPoint, offset) >= 0
-          && comparator.compare(points.getMaxPackedValue(), offset, queryUpperPoint, offset) <= 0) {
-        return true;
-      }
-    }
-    return false;
+    return comparator.compare(points.getMinPackedValue(), 0, queryLowerPoint, 0) >= 0
+        && comparator.compare(points.getMaxPackedValue(), 0, queryUpperPoint, 0) <= 0;
   }
 
-  private BoundedDocIdSetIterator getDocIdSetIteratorOrNullFromBkd(
+  private DocIdSetIterator getDocIdSetIteratorOrNullFromBkd(
       LeafReaderContext context, DocIdSetIterator delegate) throws IOException {
     Sort indexSort = context.reader().getMetaData().getSort();
-    if (indexSort != null
-        && indexSort.getSort().length > 0
-        && indexSort.getSort()[0].getField().equals(field)) {
-      final boolean reverse = indexSort.getSort()[0].getReverse();
-      PointValues points = context.reader().getPointValues(field);
-      if (points == null) {
-        return null;
-      }
+    if (indexSort == null
+        || indexSort.getSort().length == 0
+        || indexSort.getSort()[0].getField().equals(field) == false) {
+      return null;
+    }
 
-      if (points.getNumDimensions() != 1) {
-        return null;
-      }
+    final boolean reverse = indexSort.getSort()[0].getReverse();
 
-      if (points.getBytesPerDimension() != Long.BYTES
-          && points.getBytesPerDimension() != Integer.BYTES) {
-        return null;
-      }
+    PointValues points = context.reader().getPointValues(field);
+    if (points == null) {
+      return null;
+    }
 
-      if (points.size() != points.getDocCount()) {
-        return null;
-      }
+    if (points.getNumDimensions() != 1) {
+      return null;
+    }
+
+    if (points.getBytesPerDimension() != Long.BYTES
+        && points.getBytesPerDimension() != Integer.BYTES) {
+      return null;
+    }
 
-      byte[] queryLowerPoint;
-      byte[] queryUpperPoint;
-      if (points.getBytesPerDimension() == Integer.BYTES) {
-        queryLowerPoint = IntPoint.pack((int) lowerValue).bytes;
-        queryUpperPoint = IntPoint.pack((int) upperValue).bytes;
+    if (points.size() != points.getDocCount()) {
+      return null;
+    }
+
+    assert lowerValue <= upperValue;
+    byte[] queryLowerPoint;
+    byte[] queryUpperPoint;
+    if (points.getBytesPerDimension() == Integer.BYTES) {
+      queryLowerPoint = IntPoint.pack((int) lowerValue).bytes;
+      queryUpperPoint = IntPoint.pack((int) upperValue).bytes;
+    } else {
+      queryLowerPoint = LongPoint.pack(lowerValue).bytes;
+      queryUpperPoint = LongPoint.pack(upperValue).bytes;
+    }
+    if (matchNone(points, queryLowerPoint, queryUpperPoint)) {
+      return DocIdSetIterator.empty();
+    }
+    if (matchAll(points, queryLowerPoint, queryUpperPoint)) {
+      int maxDoc = context.reader().maxDoc();
+      if (points.getDocCount() == maxDoc) {
+        return delegate;

Review Comment:
   Good suggestion! That should be functionally correct (and more efficient than relying on delegate here). Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] vigyasharma commented on a diff in pull request #12003: Some minor code cleanup in IndexSortSortedNumericDocValuesRangeQuery

Posted by GitBox <gi...@apache.org>.
vigyasharma commented on code in PR #12003:
URL: https://github.com/apache/lucene/pull/12003#discussion_r1044845293


##########
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java:
##########
@@ -692,7 +697,7 @@ public int advance(int target) throws IOException {
 
     @Override
     public long cost() {
-      return lastDoc - firstDoc;
+      return Math.min(delegate.cost(), lastDoc - firstDoc);

Review Comment:
   For my understanding, why can't we just return `delegate.cost()` here? It seems to me that we're returning `DocIdSetIterator.range()` wherever  cost would be equal to `lastDoc - firstDoc`. Is that not the case?



##########
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java:
##########
@@ -406,117 +406,121 @@ public Relation compare(byte[] minPackedValue, byte[] maxPackedValue) {
 
   private boolean matchNone(PointValues points, byte[] queryLowerPoint, byte[] queryUpperPoint)
       throws IOException {
+    assert points.getNumDimensions() == 1;
     final ByteArrayComparator comparator =
         ArrayUtil.getUnsignedComparator(points.getBytesPerDimension());
-    for (int dim = 0; dim < points.getNumDimensions(); dim++) {
-      int offset = dim * points.getBytesPerDimension();
-      if (comparator.compare(points.getMinPackedValue(), offset, queryUpperPoint, offset) > 0
-          || comparator.compare(points.getMaxPackedValue(), offset, queryLowerPoint, offset) < 0) {
-        return true;
-      }
-    }
-    return false;
+    return comparator.compare(points.getMinPackedValue(), 0, queryUpperPoint, 0) > 0
+        || comparator.compare(points.getMaxPackedValue(), 0, queryLowerPoint, 0) < 0;
   }
 
   private boolean matchAll(PointValues points, byte[] queryLowerPoint, byte[] queryUpperPoint)
       throws IOException {
+    assert points.getNumDimensions() == 1;
     final ByteArrayComparator comparator =
         ArrayUtil.getUnsignedComparator(points.getBytesPerDimension());
-    for (int dim = 0; dim < points.getNumDimensions(); dim++) {
-      int offset = dim * points.getBytesPerDimension();
-      if (comparator.compare(points.getMinPackedValue(), offset, queryLowerPoint, offset) >= 0
-          && comparator.compare(points.getMaxPackedValue(), offset, queryUpperPoint, offset) <= 0) {
-        return true;
-      }
-    }
-    return false;
+    return comparator.compare(points.getMinPackedValue(), 0, queryLowerPoint, 0) >= 0
+        && comparator.compare(points.getMaxPackedValue(), 0, queryUpperPoint, 0) <= 0;
   }
 
-  private BoundedDocIdSetIterator getDocIdSetIteratorOrNullFromBkd(
+  private DocIdSetIterator getDocIdSetIteratorOrNullFromBkd(
       LeafReaderContext context, DocIdSetIterator delegate) throws IOException {
     Sort indexSort = context.reader().getMetaData().getSort();
-    if (indexSort != null
-        && indexSort.getSort().length > 0
-        && indexSort.getSort()[0].getField().equals(field)) {
-      final boolean reverse = indexSort.getSort()[0].getReverse();
-      PointValues points = context.reader().getPointValues(field);
-      if (points == null) {
-        return null;
-      }
+    if (indexSort == null
+        || indexSort.getSort().length == 0
+        || indexSort.getSort()[0].getField().equals(field) == false) {
+      return null;
+    }
 
-      if (points.getNumDimensions() != 1) {
-        return null;
-      }
+    final boolean reverse = indexSort.getSort()[0].getReverse();
 
-      if (points.getBytesPerDimension() != Long.BYTES
-          && points.getBytesPerDimension() != Integer.BYTES) {
-        return null;
-      }
+    PointValues points = context.reader().getPointValues(field);
+    if (points == null) {
+      return null;
+    }
 
-      if (points.size() != points.getDocCount()) {
-        return null;
-      }
+    if (points.getNumDimensions() != 1) {
+      return null;
+    }
+
+    if (points.getBytesPerDimension() != Long.BYTES
+        && points.getBytesPerDimension() != Integer.BYTES) {
+      return null;
+    }
 
-      byte[] queryLowerPoint;
-      byte[] queryUpperPoint;
-      if (points.getBytesPerDimension() == Integer.BYTES) {
-        queryLowerPoint = IntPoint.pack((int) lowerValue).bytes;
-        queryUpperPoint = IntPoint.pack((int) upperValue).bytes;
+    if (points.size() != points.getDocCount()) {
+      return null;
+    }
+
+    assert lowerValue <= upperValue;
+    byte[] queryLowerPoint;
+    byte[] queryUpperPoint;
+    if (points.getBytesPerDimension() == Integer.BYTES) {
+      queryLowerPoint = IntPoint.pack((int) lowerValue).bytes;
+      queryUpperPoint = IntPoint.pack((int) upperValue).bytes;
+    } else {
+      queryLowerPoint = LongPoint.pack(lowerValue).bytes;
+      queryUpperPoint = LongPoint.pack(upperValue).bytes;
+    }
+    if (matchNone(points, queryLowerPoint, queryUpperPoint)) {
+      return DocIdSetIterator.empty();
+    }
+    if (matchAll(points, queryLowerPoint, queryUpperPoint)) {
+      int maxDoc = context.reader().maxDoc();
+      if (points.getDocCount() == maxDoc) {
+        return delegate;

Review Comment:
   Curious if we can return `DocIdSetIterator.all(maxDoc)` here. Does it break correctness is some way?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] LuXugang commented on pull request #12003: Some minor code cleanup in IndexSortSortedNumericDocValuesRangeQuery

Posted by GitBox <gi...@apache.org>.
LuXugang commented on PR #12003:
URL: https://github.com/apache/lucene/pull/12003#issuecomment-1345785775

   Thanks @gsmiller , a new syntactic sugar `record` to me and first time appeard in lucene code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a diff in pull request #12003: Some minor code cleanup in IndexSortSortedNumericDocValuesRangeQuery

Posted by GitBox <gi...@apache.org>.
jpountz commented on code in PR #12003:
URL: https://github.com/apache/lucene/pull/12003#discussion_r1045084049


##########
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java:
##########
@@ -212,9 +212,9 @@ public boolean isCacheable(LeafReaderContext ctx) {
       @Override
       public int count(LeafReaderContext context) throws IOException {
         if (context.reader().hasDeletions() == false) {
-          BoundedDocIdSetIterator disi = getDocIdSetIteratorOrNull(context);
-          if (disi != null && disi.delegate == null) {
-            return disi.lastDoc - disi.firstDoc;
+          DocIdSetIterator disi = getDocIdSetIteratorOrNull(context);
+          if (disi != null && disi instanceof BoundedDocIdSetIterator == false) {
+            return Math.toIntExact(disi.cost());

Review Comment:
   I worry that this might be a bit fragile since cost() has no guarantee to be accurate. I wonder if we could make `getDocIdSetIteratorOrNull()` return both a `DocIdSetIterator` and a number of matches (possibly -1 when unknown) to make this less trappy.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on a diff in pull request #12003: Some minor code cleanup in IndexSortSortedNumericDocValuesRangeQuery

Posted by GitBox <gi...@apache.org>.
gsmiller commented on code in PR #12003:
URL: https://github.com/apache/lucene/pull/12003#discussion_r1045104269


##########
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java:
##########
@@ -212,9 +212,9 @@ public boolean isCacheable(LeafReaderContext ctx) {
       @Override
       public int count(LeafReaderContext context) throws IOException {
         if (context.reader().hasDeletions() == false) {
-          BoundedDocIdSetIterator disi = getDocIdSetIteratorOrNull(context);
-          if (disi != null && disi.delegate == null) {
-            return disi.lastDoc - disi.firstDoc;
+          DocIdSetIterator disi = getDocIdSetIteratorOrNull(context);
+          if (disi != null && disi instanceof BoundedDocIdSetIterator == false) {
+            return Math.toIntExact(disi.cost());

Review Comment:
   That's a great point. I'll work on making this less fragile. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org