You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/03/29 12:19:47 UTC

[GitHub] [lucene] jpountz commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

jpountz commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r837397382



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,76 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  /**
+   * Merges the overlapping ranges and returns unconnected ranges by calling {@link
+   * #mergeOverlappingRanges}
+   */
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+    if (numDims != 1) {
+      return this;
+    }
+    List<RangeClause> mergedRanges = mergeOverlappingRanges(rangeClauses, bytesPerDim);
+    if (mergedRanges != rangeClauses) {
+      return new MultiRangeQuery(field, numDims, bytesPerDim, mergedRanges) {
+        @Override
+        protected String toString(int dimension, byte[] value) {
+          return MultiRangeQuery.this.toString(dimension, value);
+        }
+      };

Review comment:
       I don't like that the rewritten query would have a reference to the un-rewritten query. I like this new rewrite rule but it raises a couple challenges, maybe tackle it in a separate JIRA? My current thinking is that the easier way to make it work would be to make the query implement `Clonable` and make `rewrite` change the ranges wrapped by a clone?

##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,60 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+    List<RangeClause> mergedRanges = mergeOverlappingRanges();
+    if (mergedRanges != rangeClauses) {
+      return new MultiRangeQuery(field, numDims, bytesPerDim, mergedRanges) {
+        @Override
+        protected String toString(int dimension, byte[] value) {
+          return MultiRangeQuery.this.toString(dimension, value);
+        }
+      };
+    } else {
+      return this;
+    }
+  }
+
+  private List<RangeClause> mergeOverlappingRanges() {
+    if (numDims != 1 || rangeClauses.size() <= 1) {
+      return rangeClauses;
+    }
+    List<RangeClause> originRangeClause = new ArrayList<>(rangeClauses);
+    final ArrayUtil.ByteArrayComparator comparator = ArrayUtil.getUnsignedComparator(bytesPerDim);
+    originRangeClause.sort(
+        new Comparator<RangeClause>() {
+          @Override
+          public int compare(RangeClause o1, RangeClause o2) {
+            int result = comparator.compare(o1.lowerValue, 0, o2.lowerValue, 0);
+            if (result == 0) {
+              return comparator.compare(o1.upperValue, 0, o2.upperValue, 0);
+            } else {
+              return result;
+            }
+          }
+        });
+    List<RangeClause> finalRangeClause = new ArrayList<>();
+    RangeClause current = originRangeClause.get(0);
+    for (int i = 1; i < originRangeClause.size(); i++) {
+      RangeClause nextClause = originRangeClause.get(i);
+      if (comparator.compare(nextClause.lowerValue, 0, current.upperValue, 0) > 0) {
+        finalRangeClause.add(current);
+        current = nextClause;
+      } else {
+        if (comparator.compare(nextClause.upperValue, 0, current.upperValue, 0) > 0) {
+          current = new RangeClause(current.lowerValue, nextClause.upperValue);
+        }
+      }
+    }
+    finalRangeClause.add(current);
+    if (finalRangeClause.size() != rangeClauses.size()) {

Review comment:
       Actually this is not about saving an extra object allocation, this is because the contract is that `Query#rewrite` should be called until the query returns its own instance. So if we returned a different query instance all the time, we'd get an infinite loop.

##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,69 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  /**
+   * merge its overlapping ranges and return a simpler but slightly different form by calling {@link
+   * #mergeOverlappingRanges}
+   */
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+    if (numDims != 1) {
+      return this;
+    }
+    List<RangeClause> mergedRanges = mergeOverlappingRanges(rangeClauses, bytesPerDim);
+    if (mergedRanges != rangeClauses) {
+      return new MultiRangeQuery(field, numDims, bytesPerDim, mergedRanges) {
+        @Override
+        protected String toString(int dimension, byte[] value) {
+          return MultiRangeQuery.this.toString(dimension, value);
+        }
+      };
+    } else {
+      return this;
+    }
+  }
+
+  /** merge overlapping ranges to some unconnected ranges */
+  public static List<RangeClause> mergeOverlappingRanges(

Review comment:
       Does it need to be public?

##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -314,6 +386,32 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
       public boolean isCacheable(LeafReaderContext ctx) {
         return true;
       }
+
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        if (numDims != 1 || context.reader().hasDeletions() == true) {
+          return super.count(context);
+        }
+        List<RangeClause> mergeRangeClause = mergeOverlappingRanges(rangeClauses, bytesPerDim);
+        int total = 0;
+        for (RangeClause rangeClause : mergeRangeClause) {
+          PointRangeQuery pointRangeQuery =
+              new PointRangeQuery(field, rangeClause.lowerValue, rangeClause.upperValue, numDims) {
+                @Override
+                protected String toString(int dimension, byte[] value) {
+                  return MultiRangeQuery.this.toString(dimension, value);
+                }
+              };
+          int count = pointRangeQuery.createWeight(searcher, scoreMode, boost).count(context);

Review comment:
       Summing up counts across ranges is only correct on single-valued fields. We need to check that `pointValues.size() == pointValues.docCount()`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org