You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/03/24 02:05:49 UTC
[GitHub] [lucene] gautamworah96 commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

gautamworah96 commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r833834846



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -314,6 +370,32 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
       public boolean isCacheable(LeafReaderContext ctx) {
         return true;
       }
+
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        if (numDims != 1) {
+          return super.count(context);
+        }
+        List<RangeClause> mergeRanceClause = mergeOverlappingRanges();

Review comment:
       `mergeRanceClause` -> `mergeRangeClause`

##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,67 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {
+    LongPointMultiRangeBuilder builder = new LongPointMultiRangeBuilder("ff", 1);
+    builder.add(new long[] {2}, new long[] {3});
+    builder.add(new long[] {3}, new long[] {4});
+    builder.add(new long[] {1}, new long[] {5});
+    builder.add(new long[] {8}, new long[] {10});
+    builder.add(new long[] {3}, new long[] {3});
+    builder.add(new long[] {1}, new long[] {6});
+    MultiRangeQuery multiRangeQuery = builder.build();
+    Query query = multiRangeQuery.rewrite(null);
+    Assert.assertNotEquals(query, multiRangeQuery);
+    System.out.println(query.toString());

Review comment:
       Extra debug statement?

##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,60 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {

Review comment:
       We are introducing some new (useful) functionality here (which is unrelated to the one discussed in the JIRA), let's try to make sure that we communicate this to our users. 
   
   Can you add some javadoc here stating that calling rewrite on a `MultiRangeQuery` will merge its overlapping ranges and return a simpler but slightly different form. 
   
   
   

##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,67 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {
+    LongPointMultiRangeBuilder builder = new LongPointMultiRangeBuilder("ff", 1);
+    builder.add(new long[] {2}, new long[] {3});
+    builder.add(new long[] {3}, new long[] {4});
+    builder.add(new long[] {1}, new long[] {5});
+    builder.add(new long[] {8}, new long[] {10});
+    builder.add(new long[] {3}, new long[] {3});
+    builder.add(new long[] {1}, new long[] {6});
+    MultiRangeQuery multiRangeQuery = builder.build();
+    Query query = multiRangeQuery.rewrite(null);
+    Assert.assertNotEquals(query, multiRangeQuery);

Review comment:
       Plain `assertNotEquals` should work here (just like L764).

##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,67 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {

Review comment:
       I think we can write some better tests here. Let's reuse the entire logic of adding new docs, creating new queries from `testOneDimensionCount` and refactor it to a small function. Reuse that function here and then just plainly assert that the rewritten query matches same num of docs as the original query.

##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,60 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+    List<RangeClause> mergedRanges = mergeOverlappingRanges();
+    if (mergedRanges != rangeClauses) {
+      return new MultiRangeQuery(field, numDims, bytesPerDim, mergedRanges) {
+        @Override
+        protected String toString(int dimension, byte[] value) {
+          return MultiRangeQuery.this.toString(dimension, value);
+        }
+      };
+    } else {
+      return this;
+    }
+  }
+
+  private List<RangeClause> mergeOverlappingRanges() {
+    if (numDims != 1 || rangeClauses.size() <= 1) {
+      return rangeClauses;
+    }
+    List<RangeClause> originRangeClause = new ArrayList<>(rangeClauses);
+    final ArrayUtil.ByteArrayComparator comparator = ArrayUtil.getUnsignedComparator(bytesPerDim);
+    originRangeClause.sort(
+        new Comparator<RangeClause>() {
+          @Override
+          public int compare(RangeClause o1, RangeClause o2) {
+            int result = comparator.compare(o1.lowerValue, 0, o2.lowerValue, 0);
+            if (result == 0) {
+              return comparator.compare(o1.upperValue, 0, o2.upperValue, 0);
+            } else {
+              return result;
+            }
+          }
+        });
+    List<RangeClause> finalRangeClause = new ArrayList<>();
+    RangeClause current = originRangeClause.get(0);
+    for (int i = 1; i < originRangeClause.size(); i++) {
+      RangeClause nextClause = originRangeClause.get(i);
+      if (comparator.compare(nextClause.lowerValue, 0, current.upperValue, 0) > 0) {
+        finalRangeClause.add(current);
+        current = nextClause;
+      } else {
+        if (comparator.compare(nextClause.upperValue, 0, current.upperValue, 0) > 0) {
+          current = new RangeClause(current.lowerValue, nextClause.upperValue);
+        }
+      }
+    }
+    finalRangeClause.add(current);
+    if (finalRangeClause.size() != rangeClauses.size()) {

Review comment:
       Let's directly return `finalRangeClause` here?

##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,60 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+    List<RangeClause> mergedRanges = mergeOverlappingRanges();
+    if (mergedRanges != rangeClauses) {
+      return new MultiRangeQuery(field, numDims, bytesPerDim, mergedRanges) {
+        @Override
+        protected String toString(int dimension, byte[] value) {
+          return MultiRangeQuery.this.toString(dimension, value);
+        }
+      };
+    } else {
+      return this;
+    }
+  }
+
+  private List<RangeClause> mergeOverlappingRanges() {

Review comment:
       This function feels like somewhat of an interview question. Which means this should probably not be something we should be implementing ourselves/we may have already done it somewhere. I could not find any other place where we have solved this problem though (I searched in places where we create `Comparators` in the codebase)!
   
    I think in the future, some other features could also use this utility? How about turning this method to `static`, adding some javadoc and taking `rangeClauses` as input?

##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,67 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {
+    LongPointMultiRangeBuilder builder = new LongPointMultiRangeBuilder("ff", 1);
+    builder.add(new long[] {2}, new long[] {3});
+    builder.add(new long[] {3}, new long[] {4});
+    builder.add(new long[] {1}, new long[] {5});
+    builder.add(new long[] {8}, new long[] {10});
+    builder.add(new long[] {3}, new long[] {3});
+    builder.add(new long[] {1}, new long[] {6});
+    MultiRangeQuery multiRangeQuery = builder.build();
+    Query query = multiRangeQuery.rewrite(null);
+    Assert.assertNotEquals(query, multiRangeQuery);
+    System.out.println(query.toString());
+    Query query1 = query.rewrite(null);
+    Assert.assertEquals(query1, query);
+  }
+
+  public void testOneDimensionCount() throws IOException {
+    Directory dir = newDirectory();
+    RandomIndexWriter w = new RandomIndexWriter(random(), dir);
+    Random random = random();
+    int dims = 1;
+
+    for (int i = 0; i < random.nextInt(100, 500); i++) {
+      int numPoints = RandomNumbers.randomIntBetween(random(), 1, 200);
+      long value = RandomNumbers.randomLongBetween(random(), 0, 2000);
+      for (int j = 0; j < numPoints; j++) {
+        Document doc = new Document();
+        doc.add(new LongPoint("point", value));
+        w.addDocument(doc);
+      }
+    }
+    w.flush();
+    w.forceMerge(1);
+    IndexReader reader = w.getReader();
+    IndexSearcher searcher = newSearcher(reader);
+
+    for (int n = 0; n < 100; n++) {
+      int numRanges = RandomNumbers.randomIntBetween(random(), 1, 20);
+      LongPointMultiRangeBuilder builder1 = new LongPointMultiRangeBuilder("point", dims);
+      BooleanQuery.Builder builder2 = new BooleanQuery.Builder();
+      for (int i = 0; i < numRanges; i++) {
+        long[] lower = new long[dims];
+        long[] upper = new long[dims];
+        for (int j = 0; j < dims; j++) {
+          lower[j] = RandomNumbers.randomLongBetween(random(), 0, 2000);
+          upper[j] = lower[j] + RandomNumbers.randomLongBetween(random(), 0, 2000);
+        }
+        builder1.add(lower, upper);
+        builder2.add(LongPoint.newRangeQuery("point", lower, upper), BooleanClause.Occur.SHOULD);
+      }
+
+      MultiRangeQuery multiRangeQuery = builder1.build();
+      BooleanQuery booleanQuery = builder2.build();
+      int count =
+          multiRangeQuery
+              .createWeight(searcher, ScoreMode.COMPLETE, 1.0f)
+              .count(searcher.getLeafContexts().get(0));
+      int booleanCount = searcher.count(booleanQuery);
+      assertEquals(booleanCount, count);

Review comment:
       The flow of this function is smooth!

##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -314,6 +370,32 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
       public boolean isCacheable(LeafReaderContext ctx) {
         return true;
       }
+
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        if (numDims != 1) {

Review comment:
       We also have to check that the `context.reader()` has no deletions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org