You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/03/05 09:51:10 UTC

[GitHub] [lucene] wjp719 opened a new pull request #731: LUCENE-10456: support MultiRangeQuery#count()

wjp719 opened a new pull request #731:
URL: https://github.com/apache/lucene/pull/731


    
   
   
   # Description
   
   for one dimension MultiRangeQuery, we can firstly   merge overlapping ranges to have some unconnected ranges, then the doc count of this multiRangeQuery is the sum of each doc count of these unconnected range. For each unconnected range, we can take advantage of PointRangeQuery count() method.
   
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request title.
   - [x] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wjp719 commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
wjp719 commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r839468461



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -314,6 +388,36 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
       public boolean isCacheable(LeafReaderContext ctx) {
         return true;
       }
+
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        if (numDims != 1 || context.reader().hasDeletions() == true) {
+          return super.count(context);
+        }
+        PointValues pointValues = context.reader().getPointValues(field);
+        if (pointValues == null || pointValues.size() != pointValues.getDocCount()) {
+          return super.count(context);
+        }
+        List<RangeClause> mergeRangeClause = mergeOverlappingRanges(rangeClauses, bytesPerDim);

Review comment:
       done

##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -314,6 +388,36 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
       public boolean isCacheable(LeafReaderContext ctx) {
         return true;
       }
+
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        if (numDims != 1 || context.reader().hasDeletions() == true) {
+          return super.count(context);
+        }
+        PointValues pointValues = context.reader().getPointValues(field);
+        if (pointValues == null || pointValues.size() != pointValues.getDocCount()) {
+          return super.count(context);
+        }
+        List<RangeClause> mergeRangeClause = mergeOverlappingRanges(rangeClauses, bytesPerDim);
+        int total = 0;
+        for (RangeClause rangeClause : mergeRangeClause) {
+          PointRangeQuery pointRangeQuery =
+              new PointRangeQuery(field, rangeClause.lowerValue, rangeClause.upperValue, numDims) {
+                @Override
+                protected String toString(int dimension, byte[] value) {
+                  return MultiRangeQuery.this.toString(dimension, value);
+                }
+              };
+          int count = pointRangeQuery.createWeight(searcher, scoreMode, boost).count(context);

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wjp719 commented on pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
wjp719 commented on pull request #731:
URL: https://github.com/apache/lucene/pull/731#issuecomment-1081609232


   @gautamworah96 thank you for your patient reviews. I have run the two new added tests 2000 iterations and all passed. 
   
   @jpountz @iverase please help to review this pr, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
jpountz commented on pull request #731:
URL: https://github.com/apache/lucene/pull/731#issuecomment-1082881124


   I just looked at the cloning logic, it looks good to me, thanks for making this change!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wjp719 commented on pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
wjp719 commented on pull request #731:
URL: https://github.com/apache/lucene/pull/731#issuecomment-1080600767


   @gautamworah96  Hi, I have refactor the code as you suggest, please help to review  it again, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wjp719 commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
wjp719 commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r837000418



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,69 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  /**
+   * merge its overlapping ranges and return a simpler but slightly different form by calling {@link
+   * #mergeOverlappingRanges}
+   */
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+    if (numDims != 1) {
+      return this;
+    }
+    List<RangeClause> mergedRanges = mergeOverlappingRanges(rangeClauses, bytesPerDim);
+    if (mergedRanges != rangeClauses) {
+      return new MultiRangeQuery(field, numDims, bytesPerDim, mergedRanges) {
+        @Override
+        protected String toString(int dimension, byte[] value) {
+          return MultiRangeQuery.this.toString(dimension, value);
+        }
+      };
+    } else {
+      return this;
+    }
+  }
+
+  /** merge overlapping ranges to some unconnected ranges */
+  public static List<RangeClause> mergeOverlappingRanges(

Review comment:
       done

##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,103 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {
+    LongPointMultiRangeBuilder builder = new LongPointMultiRangeBuilder("ff", 1);
+    builder.add(new long[] {2}, new long[] {3});
+    builder.add(new long[] {3}, new long[] {4});
+    builder.add(new long[] {1}, new long[] {5});
+    builder.add(new long[] {8}, new long[] {10});
+    builder.add(new long[] {3}, new long[] {3});
+    builder.add(new long[] {1}, new long[] {6});
+    MultiRangeQuery multiRangeQuery = builder.build();
+    Query query = multiRangeQuery.rewrite(null);
+    assertNotEquals(query, multiRangeQuery);
+    Query query1 = query.rewrite(null);
+    Assert.assertEquals(query1, query);
+  }
+
+  public void testRandomRewrite() throws IOException {
+    Directory dir = newDirectory();
+    RandomIndexWriter w = new RandomIndexWriter(random(), dir);
+    int dims = 1;
+    addRandomDoc(w);
+
+    IndexReader reader = w.getReader();
+    IndexSearcher searcher = newSearcher(reader);
+
+    for (int n = 0; n < 100; n++) {

Review comment:
       done

##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,103 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {
+    LongPointMultiRangeBuilder builder = new LongPointMultiRangeBuilder("ff", 1);
+    builder.add(new long[] {2}, new long[] {3});
+    builder.add(new long[] {3}, new long[] {4});
+    builder.add(new long[] {1}, new long[] {5});
+    builder.add(new long[] {8}, new long[] {10});
+    builder.add(new long[] {3}, new long[] {3});
+    builder.add(new long[] {1}, new long[] {6});
+    MultiRangeQuery multiRangeQuery = builder.build();
+    Query query = multiRangeQuery.rewrite(null);
+    assertNotEquals(query, multiRangeQuery);
+    Query query1 = query.rewrite(null);
+    Assert.assertEquals(query1, query);
+  }
+
+  public void testRandomRewrite() throws IOException {
+    Directory dir = newDirectory();
+    RandomIndexWriter w = new RandomIndexWriter(random(), dir);
+    int dims = 1;
+    addRandomDoc(w);
+
+    IndexReader reader = w.getReader();
+    IndexSearcher searcher = newSearcher(reader);
+
+    for (int n = 0; n < 100; n++) {
+      int numRanges = RandomNumbers.randomIntBetween(random(), 1, 20);
+      LongPointMultiRangeBuilder builder1 = new LongPointMultiRangeBuilder("point", dims);
+      BooleanQuery.Builder builder2 = new BooleanQuery.Builder();
+      for (int i = 0; i < numRanges; i++) {
+        long[] lower = new long[dims];
+        long[] upper = new long[dims];
+        for (int j = 0; j < dims; j++) {
+          lower[j] = RandomNumbers.randomLongBetween(random(), 0, 2000);
+          upper[j] = lower[j] + RandomNumbers.randomLongBetween(random(), 0, 2000);
+        }
+        builder1.add(lower, upper);
+        builder2.add(LongPoint.newRangeQuery("point", lower, upper), BooleanClause.Occur.SHOULD);

Review comment:
       done

##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,103 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {
+    LongPointMultiRangeBuilder builder = new LongPointMultiRangeBuilder("ff", 1);
+    builder.add(new long[] {2}, new long[] {3});
+    builder.add(new long[] {3}, new long[] {4});
+    builder.add(new long[] {1}, new long[] {5});
+    builder.add(new long[] {8}, new long[] {10});
+    builder.add(new long[] {3}, new long[] {3});
+    builder.add(new long[] {1}, new long[] {6});
+    MultiRangeQuery multiRangeQuery = builder.build();
+    Query query = multiRangeQuery.rewrite(null);
+    assertNotEquals(query, multiRangeQuery);
+    Query query1 = query.rewrite(null);
+    Assert.assertEquals(query1, query);
+  }
+
+  public void testRandomRewrite() throws IOException {

Review comment:
       done

##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,103 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {
+    LongPointMultiRangeBuilder builder = new LongPointMultiRangeBuilder("ff", 1);
+    builder.add(new long[] {2}, new long[] {3});
+    builder.add(new long[] {3}, new long[] {4});
+    builder.add(new long[] {1}, new long[] {5});
+    builder.add(new long[] {8}, new long[] {10});
+    builder.add(new long[] {3}, new long[] {3});
+    builder.add(new long[] {1}, new long[] {6});
+    MultiRangeQuery multiRangeQuery = builder.build();
+    Query query = multiRangeQuery.rewrite(null);
+    assertNotEquals(query, multiRangeQuery);
+    Query query1 = query.rewrite(null);
+    Assert.assertEquals(query1, query);
+  }
+
+  public void testRandomRewrite() throws IOException {
+    Directory dir = newDirectory();
+    RandomIndexWriter w = new RandomIndexWriter(random(), dir);
+    int dims = 1;
+    addRandomDoc(w);
+
+    IndexReader reader = w.getReader();
+    IndexSearcher searcher = newSearcher(reader);
+
+    for (int n = 0; n < 100; n++) {
+      int numRanges = RandomNumbers.randomIntBetween(random(), 1, 20);
+      LongPointMultiRangeBuilder builder1 = new LongPointMultiRangeBuilder("point", dims);
+      BooleanQuery.Builder builder2 = new BooleanQuery.Builder();
+      for (int i = 0; i < numRanges; i++) {
+        long[] lower = new long[dims];
+        long[] upper = new long[dims];
+        for (int j = 0; j < dims; j++) {
+          lower[j] = RandomNumbers.randomLongBetween(random(), 0, 2000);
+          upper[j] = lower[j] + RandomNumbers.randomLongBetween(random(), 0, 2000);
+        }
+        builder1.add(lower, upper);
+        builder2.add(LongPoint.newRangeQuery("point", lower, upper), BooleanClause.Occur.SHOULD);
+      }
+
+      MultiRangeQuery multiRangeQuery = builder1.build();
+      MultiRangeQuery rewriteMultiRangeQuery = (MultiRangeQuery) multiRangeQuery.rewrite(reader);
+      int count = searcher.count(multiRangeQuery);
+      int rewriteCount = searcher.count(rewriteMultiRangeQuery);
+      assertEquals(rewriteCount, count);
+    }
+    IOUtils.close(reader, w, dir);
+  }
+
+  private void addRandomDoc(RandomIndexWriter w) throws IOException {

Review comment:
       done

##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,103 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {
+    LongPointMultiRangeBuilder builder = new LongPointMultiRangeBuilder("ff", 1);
+    builder.add(new long[] {2}, new long[] {3});
+    builder.add(new long[] {3}, new long[] {4});
+    builder.add(new long[] {1}, new long[] {5});
+    builder.add(new long[] {8}, new long[] {10});
+    builder.add(new long[] {3}, new long[] {3});
+    builder.add(new long[] {1}, new long[] {6});
+    MultiRangeQuery multiRangeQuery = builder.build();
+    Query query = multiRangeQuery.rewrite(null);
+    assertNotEquals(query, multiRangeQuery);
+    Query query1 = query.rewrite(null);
+    Assert.assertEquals(query1, query);
+  }

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wjp719 commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
wjp719 commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r838428950



##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +763,90 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  private void addRandomDocs(RandomIndexWriter w) throws IOException {
+    Random random = random();
+    for (int i = 0; i < random.nextInt(100, 500); i++) {
+      int numPoints = RandomNumbers.randomIntBetween(random(), 1, 200);
+      long value = RandomNumbers.randomLongBetween(random(), 0, 2000);
+      for (int j = 0; j < numPoints; j++) {
+        Document doc = new Document();
+        doc.add(new LongPoint("point", value));
+        w.addDocument(doc);
+      }
+    }
+    w.flush();
+    w.forceMerge(1);
+  }
+
+  /** The hit doc count of the rewritten query should be the same as origin query's */
+  public void testRandomRewrite() throws IOException {
+    Directory dir = newDirectory();
+    RandomIndexWriter w = new RandomIndexWriter(random(), dir);
+    int dims = 1;
+    addRandomDocs(w);
+    Random random = random();
+
+    IndexReader reader = w.getReader();
+    IndexSearcher searcher = newSearcher(reader);
+    int numIters = random.nextInt(200);
+
+    for (int n = 0; n < numIters; n++) {
+      int numRanges = RandomNumbers.randomIntBetween(random(), 1, 20);
+      LongPointMultiRangeBuilder builder1 = new LongPointMultiRangeBuilder("point", dims);
+      for (int i = 0; i < numRanges; i++) {
+        long[] lower = new long[dims];
+        long[] upper = new long[dims];
+        for (int j = 0; j < dims; j++) {
+          lower[j] = RandomNumbers.randomLongBetween(random(), 0, 2000);
+          upper[j] = lower[j] + RandomNumbers.randomLongBetween(random(), 0, 2000);
+        }
+        builder1.add(lower, upper);
+      }
+
+      MultiRangeQuery multiRangeQuery = builder1.build();
+      MultiRangeQuery rewriteMultiRangeQuery = (MultiRangeQuery) multiRangeQuery.rewrite(reader);
+      int count = searcher.count(multiRangeQuery);
+      int rewriteCount = searcher.count(rewriteMultiRangeQuery);
+      assertEquals(rewriteCount, count);

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gautamworah96 commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
gautamworah96 commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r833853974



##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,67 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {
+    LongPointMultiRangeBuilder builder = new LongPointMultiRangeBuilder("ff", 1);
+    builder.add(new long[] {2}, new long[] {3});
+    builder.add(new long[] {3}, new long[] {4});
+    builder.add(new long[] {1}, new long[] {5});
+    builder.add(new long[] {8}, new long[] {10});
+    builder.add(new long[] {3}, new long[] {3});
+    builder.add(new long[] {1}, new long[] {6});
+    MultiRangeQuery multiRangeQuery = builder.build();
+    Query query = multiRangeQuery.rewrite(null);
+    Assert.assertNotEquals(query, multiRangeQuery);

Review comment:
       Plain `assertNotEquals` should work here (just like L764). Same for the `Assert.assertEquals` call




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gautamworah96 commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
gautamworah96 commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r836963073



##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,103 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {
+    LongPointMultiRangeBuilder builder = new LongPointMultiRangeBuilder("ff", 1);
+    builder.add(new long[] {2}, new long[] {3});
+    builder.add(new long[] {3}, new long[] {4});
+    builder.add(new long[] {1}, new long[] {5});
+    builder.add(new long[] {8}, new long[] {10});
+    builder.add(new long[] {3}, new long[] {3});
+    builder.add(new long[] {1}, new long[] {6});
+    MultiRangeQuery multiRangeQuery = builder.build();
+    Query query = multiRangeQuery.rewrite(null);
+    assertNotEquals(query, multiRangeQuery);
+    Query query1 = query.rewrite(null);
+    Assert.assertEquals(query1, query);
+  }
+
+  public void testRandomRewrite() throws IOException {

Review comment:
       Lets write a small comment here explaining what behavior this function is testing? Same for the `testOneDimensionCount` function

##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,103 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {
+    LongPointMultiRangeBuilder builder = new LongPointMultiRangeBuilder("ff", 1);
+    builder.add(new long[] {2}, new long[] {3});
+    builder.add(new long[] {3}, new long[] {4});
+    builder.add(new long[] {1}, new long[] {5});
+    builder.add(new long[] {8}, new long[] {10});
+    builder.add(new long[] {3}, new long[] {3});
+    builder.add(new long[] {1}, new long[] {6});
+    MultiRangeQuery multiRangeQuery = builder.build();
+    Query query = multiRangeQuery.rewrite(null);
+    assertNotEquals(query, multiRangeQuery);
+    Query query1 = query.rewrite(null);
+    Assert.assertEquals(query1, query);
+  }
+
+  public void testRandomRewrite() throws IOException {
+    Directory dir = newDirectory();
+    RandomIndexWriter w = new RandomIndexWriter(random(), dir);
+    int dims = 1;
+    addRandomDoc(w);
+
+    IndexReader reader = w.getReader();
+    IndexSearcher searcher = newSearcher(reader);
+
+    for (int n = 0; n < 100; n++) {

Review comment:
       Maybe change it to `n < numIters` and randomly generate `numIters`. This makes it a bit clearer that the outermost loop is for the num of iterations. Same for the `testOneDimensionCount` function

##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,103 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {
+    LongPointMultiRangeBuilder builder = new LongPointMultiRangeBuilder("ff", 1);
+    builder.add(new long[] {2}, new long[] {3});
+    builder.add(new long[] {3}, new long[] {4});
+    builder.add(new long[] {1}, new long[] {5});
+    builder.add(new long[] {8}, new long[] {10});
+    builder.add(new long[] {3}, new long[] {3});
+    builder.add(new long[] {1}, new long[] {6});
+    MultiRangeQuery multiRangeQuery = builder.build();
+    Query query = multiRangeQuery.rewrite(null);
+    assertNotEquals(query, multiRangeQuery);
+    Query query1 = query.rewrite(null);
+    Assert.assertEquals(query1, query);
+  }
+
+  public void testRandomRewrite() throws IOException {
+    Directory dir = newDirectory();
+    RandomIndexWriter w = new RandomIndexWriter(random(), dir);
+    int dims = 1;
+    addRandomDoc(w);
+
+    IndexReader reader = w.getReader();
+    IndexSearcher searcher = newSearcher(reader);
+
+    for (int n = 0; n < 100; n++) {
+      int numRanges = RandomNumbers.randomIntBetween(random(), 1, 20);
+      LongPointMultiRangeBuilder builder1 = new LongPointMultiRangeBuilder("point", dims);
+      BooleanQuery.Builder builder2 = new BooleanQuery.Builder();
+      for (int i = 0; i < numRanges; i++) {
+        long[] lower = new long[dims];
+        long[] upper = new long[dims];
+        for (int j = 0; j < dims; j++) {
+          lower[j] = RandomNumbers.randomLongBetween(random(), 0, 2000);
+          upper[j] = lower[j] + RandomNumbers.randomLongBetween(random(), 0, 2000);
+        }
+        builder1.add(lower, upper);
+        builder2.add(LongPoint.newRangeQuery("point", lower, upper), BooleanClause.Occur.SHOULD);
+      }
+
+      MultiRangeQuery multiRangeQuery = builder1.build();
+      MultiRangeQuery rewriteMultiRangeQuery = (MultiRangeQuery) multiRangeQuery.rewrite(reader);
+      int count = searcher.count(multiRangeQuery);
+      int rewriteCount = searcher.count(rewriteMultiRangeQuery);
+      assertEquals(rewriteCount, count);
+    }
+    IOUtils.close(reader, w, dir);
+  }
+
+  private void addRandomDoc(RandomIndexWriter w) throws IOException {

Review comment:
       minor: addRandomDoc -> addRandomDocs.
   Also, I would put the function before both the `testRandomRewrite` and the `testOneDimensionCount` functions.

##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,60 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+    List<RangeClause> mergedRanges = mergeOverlappingRanges();
+    if (mergedRanges != rangeClauses) {
+      return new MultiRangeQuery(field, numDims, bytesPerDim, mergedRanges) {
+        @Override
+        protected String toString(int dimension, byte[] value) {
+          return MultiRangeQuery.this.toString(dimension, value);
+        }
+      };
+    } else {
+      return this;
+    }
+  }
+
+  private List<RangeClause> mergeOverlappingRanges() {
+    if (numDims != 1 || rangeClauses.size() <= 1) {
+      return rangeClauses;
+    }
+    List<RangeClause> originRangeClause = new ArrayList<>(rangeClauses);
+    final ArrayUtil.ByteArrayComparator comparator = ArrayUtil.getUnsignedComparator(bytesPerDim);
+    originRangeClause.sort(
+        new Comparator<RangeClause>() {
+          @Override
+          public int compare(RangeClause o1, RangeClause o2) {
+            int result = comparator.compare(o1.lowerValue, 0, o2.lowerValue, 0);
+            if (result == 0) {
+              return comparator.compare(o1.upperValue, 0, o2.upperValue, 0);
+            } else {
+              return result;
+            }
+          }
+        });
+    List<RangeClause> finalRangeClause = new ArrayList<>();
+    RangeClause current = originRangeClause.get(0);
+    for (int i = 1; i < originRangeClause.size(); i++) {
+      RangeClause nextClause = originRangeClause.get(i);
+      if (comparator.compare(nextClause.lowerValue, 0, current.upperValue, 0) > 0) {
+        finalRangeClause.add(current);
+        current = nextClause;
+      } else {
+        if (comparator.compare(nextClause.upperValue, 0, current.upperValue, 0) > 0) {
+          current = new RangeClause(current.lowerValue, nextClause.upperValue);
+        }
+      }
+    }
+    finalRangeClause.add(current);
+    if (finalRangeClause.size() != rangeClauses.size()) {

Review comment:
       Ah. Makes sense. Maybe add a comment here explaining that this saves us from creating an extra `MultiRangeQuery` object in `rewrite`?

##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,69 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  /**
+   * merge its overlapping ranges and return a simpler but slightly different form by calling {@link
+   * #mergeOverlappingRanges}
+   */
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+    if (numDims != 1) {
+      return this;
+    }
+    List<RangeClause> mergedRanges = mergeOverlappingRanges(rangeClauses, bytesPerDim);
+    if (mergedRanges != rangeClauses) {
+      return new MultiRangeQuery(field, numDims, bytesPerDim, mergedRanges) {
+        @Override
+        protected String toString(int dimension, byte[] value) {
+          return MultiRangeQuery.this.toString(dimension, value);
+        }
+      };
+    } else {
+      return this;
+    }
+  }
+
+  /** merge overlapping ranges to some unconnected ranges */
+  public static List<RangeClause> mergeOverlappingRanges(

Review comment:
       Lets write better javadoc here with some parameter descriptions for the input parameters since this is a static function that other classes could also use. 

##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,103 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {
+    LongPointMultiRangeBuilder builder = new LongPointMultiRangeBuilder("ff", 1);
+    builder.add(new long[] {2}, new long[] {3});
+    builder.add(new long[] {3}, new long[] {4});
+    builder.add(new long[] {1}, new long[] {5});
+    builder.add(new long[] {8}, new long[] {10});
+    builder.add(new long[] {3}, new long[] {3});
+    builder.add(new long[] {1}, new long[] {6});
+    MultiRangeQuery multiRangeQuery = builder.build();
+    Query query = multiRangeQuery.rewrite(null);
+    assertNotEquals(query, multiRangeQuery);
+    Query query1 = query.rewrite(null);
+    Assert.assertEquals(query1, query);
+  }
+
+  public void testRandomRewrite() throws IOException {
+    Directory dir = newDirectory();
+    RandomIndexWriter w = new RandomIndexWriter(random(), dir);
+    int dims = 1;
+    addRandomDoc(w);
+
+    IndexReader reader = w.getReader();
+    IndexSearcher searcher = newSearcher(reader);
+
+    for (int n = 0; n < 100; n++) {
+      int numRanges = RandomNumbers.randomIntBetween(random(), 1, 20);
+      LongPointMultiRangeBuilder builder1 = new LongPointMultiRangeBuilder("point", dims);
+      BooleanQuery.Builder builder2 = new BooleanQuery.Builder();
+      for (int i = 0; i < numRanges; i++) {
+        long[] lower = new long[dims];
+        long[] upper = new long[dims];
+        for (int j = 0; j < dims; j++) {
+          lower[j] = RandomNumbers.randomLongBetween(random(), 0, 2000);
+          upper[j] = lower[j] + RandomNumbers.randomLongBetween(random(), 0, 2000);
+        }
+        builder1.add(lower, upper);
+        builder2.add(LongPoint.newRangeQuery("point", lower, upper), BooleanClause.Occur.SHOULD);

Review comment:
       We don't need `builder2` in this function right? We are rewriting the `builder1` query...

##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,103 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {
+    LongPointMultiRangeBuilder builder = new LongPointMultiRangeBuilder("ff", 1);
+    builder.add(new long[] {2}, new long[] {3});
+    builder.add(new long[] {3}, new long[] {4});
+    builder.add(new long[] {1}, new long[] {5});
+    builder.add(new long[] {8}, new long[] {10});
+    builder.add(new long[] {3}, new long[] {3});
+    builder.add(new long[] {1}, new long[] {6});
+    MultiRangeQuery multiRangeQuery = builder.build();
+    Query query = multiRangeQuery.rewrite(null);
+    assertNotEquals(query, multiRangeQuery);
+    Query query1 = query.rewrite(null);
+    Assert.assertEquals(query1, query);
+  }

Review comment:
       Lets remove this `testRewrite` function now that we are adding a randomized test for it.

##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,60 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {

Review comment:
       Hmm. I should've been clearer. I meant something like
   ```
   Merges the overlapping ranges and returns unconnected ranges by calling {@link #mergeOverlappingRanges}
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wjp719 commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
wjp719 commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r837001367



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,69 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  /**
+   * merge its overlapping ranges and return a simpler but slightly different form by calling {@link
+   * #mergeOverlappingRanges}
+   */
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+    if (numDims != 1) {
+      return this;
+    }
+    List<RangeClause> mergedRanges = mergeOverlappingRanges(rangeClauses, bytesPerDim);
+    if (mergedRanges != rangeClauses) {
+      return new MultiRangeQuery(field, numDims, bytesPerDim, mergedRanges) {
+        @Override
+        protected String toString(int dimension, byte[] value) {
+          return MultiRangeQuery.this.toString(dimension, value);
+        }
+      };
+    } else {
+      return this;
+    }
+  }
+
+  /** merge overlapping ranges to some unconnected ranges */
+  public static List<RangeClause> mergeOverlappingRanges(

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
jpountz commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r838348390



##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +763,90 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  private void addRandomDocs(RandomIndexWriter w) throws IOException {
+    Random random = random();
+    for (int i = 0; i < random.nextInt(100, 500); i++) {

Review comment:
       Let's not re-compute `nextInt` on every iteration.
   
   ```suggestion
       for (int i = 0, end = random.nextInt(100, 500); i < end; i++) {
   ```

##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +763,90 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  private void addRandomDocs(RandomIndexWriter w) throws IOException {
+    Random random = random();
+    for (int i = 0; i < random.nextInt(100, 500); i++) {
+      int numPoints = RandomNumbers.randomIntBetween(random(), 1, 200);
+      long value = RandomNumbers.randomLongBetween(random(), 0, 2000);
+      for (int j = 0; j < numPoints; j++) {
+        Document doc = new Document();
+        doc.add(new LongPoint("point", value));
+        w.addDocument(doc);
+      }
+    }
+    w.flush();
+    w.forceMerge(1);
+  }
+
+  /** The hit doc count of the rewritten query should be the same as origin query's */
+  public void testRandomRewrite() throws IOException {
+    Directory dir = newDirectory();
+    RandomIndexWriter w = new RandomIndexWriter(random(), dir);
+    int dims = 1;
+    addRandomDocs(w);
+    Random random = random();
+
+    IndexReader reader = w.getReader();
+    IndexSearcher searcher = newSearcher(reader);
+    int numIters = random.nextInt(200);

Review comment:
       we'd usually rather use `atLeast` to compute numbers of iterations, e.g.
   ```suggestion
       int numIters = atLeast(100);
   ```

##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +763,90 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  private void addRandomDocs(RandomIndexWriter w) throws IOException {
+    Random random = random();
+    for (int i = 0; i < random.nextInt(100, 500); i++) {
+      int numPoints = RandomNumbers.randomIntBetween(random(), 1, 200);
+      long value = RandomNumbers.randomLongBetween(random(), 0, 2000);
+      for (int j = 0; j < numPoints; j++) {
+        Document doc = new Document();
+        doc.add(new LongPoint("point", value));
+        w.addDocument(doc);
+      }
+    }
+    w.flush();
+    w.forceMerge(1);
+  }
+
+  /** The hit doc count of the rewritten query should be the same as origin query's */
+  public void testRandomRewrite() throws IOException {
+    Directory dir = newDirectory();
+    RandomIndexWriter w = new RandomIndexWriter(random(), dir);
+    int dims = 1;
+    addRandomDocs(w);
+    Random random = random();
+
+    IndexReader reader = w.getReader();
+    IndexSearcher searcher = newSearcher(reader);
+    int numIters = random.nextInt(200);
+
+    for (int n = 0; n < numIters; n++) {
+      int numRanges = RandomNumbers.randomIntBetween(random(), 1, 20);
+      LongPointMultiRangeBuilder builder1 = new LongPointMultiRangeBuilder("point", dims);
+      for (int i = 0; i < numRanges; i++) {
+        long[] lower = new long[dims];
+        long[] upper = new long[dims];
+        for (int j = 0; j < dims; j++) {
+          lower[j] = RandomNumbers.randomLongBetween(random(), 0, 2000);
+          upper[j] = lower[j] + RandomNumbers.randomLongBetween(random(), 0, 2000);
+        }
+        builder1.add(lower, upper);
+      }
+
+      MultiRangeQuery multiRangeQuery = builder1.build();
+      MultiRangeQuery rewriteMultiRangeQuery = (MultiRangeQuery) multiRangeQuery.rewrite(reader);
+      int count = searcher.count(multiRangeQuery);
+      int rewriteCount = searcher.count(rewriteMultiRangeQuery);
+      assertEquals(rewriteCount, count);

Review comment:
       Actually this test doesn't verify anything since `IndexSearcher#count` will call `rewrite` under the hood. I think you would need to compare the `MultiRangeQuery` against a disjunction of `PointRangeQuery` like you did in `testOneDimensionCount`.
   
   Also, `IndexSearcher#count` uses `Weight#count` under the hood, so maybe use a `TotalHitCountCollectorManager` to force `IndexSearcher` to iterate over all matches for the counting?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
jpountz commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r837397382



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,76 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  /**
+   * Merges the overlapping ranges and returns unconnected ranges by calling {@link
+   * #mergeOverlappingRanges}
+   */
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+    if (numDims != 1) {
+      return this;
+    }
+    List<RangeClause> mergedRanges = mergeOverlappingRanges(rangeClauses, bytesPerDim);
+    if (mergedRanges != rangeClauses) {
+      return new MultiRangeQuery(field, numDims, bytesPerDim, mergedRanges) {
+        @Override
+        protected String toString(int dimension, byte[] value) {
+          return MultiRangeQuery.this.toString(dimension, value);
+        }
+      };

Review comment:
       I don't like that the rewritten query would have a reference to the un-rewritten query. I like this new rewrite rule but it raises a couple challenges, maybe tackle it in a separate JIRA? My current thinking is that the easier way to make it work would be to make the query implement `Clonable` and make `rewrite` change the ranges wrapped by a clone?

##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,60 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+    List<RangeClause> mergedRanges = mergeOverlappingRanges();
+    if (mergedRanges != rangeClauses) {
+      return new MultiRangeQuery(field, numDims, bytesPerDim, mergedRanges) {
+        @Override
+        protected String toString(int dimension, byte[] value) {
+          return MultiRangeQuery.this.toString(dimension, value);
+        }
+      };
+    } else {
+      return this;
+    }
+  }
+
+  private List<RangeClause> mergeOverlappingRanges() {
+    if (numDims != 1 || rangeClauses.size() <= 1) {
+      return rangeClauses;
+    }
+    List<RangeClause> originRangeClause = new ArrayList<>(rangeClauses);
+    final ArrayUtil.ByteArrayComparator comparator = ArrayUtil.getUnsignedComparator(bytesPerDim);
+    originRangeClause.sort(
+        new Comparator<RangeClause>() {
+          @Override
+          public int compare(RangeClause o1, RangeClause o2) {
+            int result = comparator.compare(o1.lowerValue, 0, o2.lowerValue, 0);
+            if (result == 0) {
+              return comparator.compare(o1.upperValue, 0, o2.upperValue, 0);
+            } else {
+              return result;
+            }
+          }
+        });
+    List<RangeClause> finalRangeClause = new ArrayList<>();
+    RangeClause current = originRangeClause.get(0);
+    for (int i = 1; i < originRangeClause.size(); i++) {
+      RangeClause nextClause = originRangeClause.get(i);
+      if (comparator.compare(nextClause.lowerValue, 0, current.upperValue, 0) > 0) {
+        finalRangeClause.add(current);
+        current = nextClause;
+      } else {
+        if (comparator.compare(nextClause.upperValue, 0, current.upperValue, 0) > 0) {
+          current = new RangeClause(current.lowerValue, nextClause.upperValue);
+        }
+      }
+    }
+    finalRangeClause.add(current);
+    if (finalRangeClause.size() != rangeClauses.size()) {

Review comment:
       Actually this is not about saving an extra object allocation, this is because the contract is that `Query#rewrite` should be called until the query returns its own instance. So if we returned a different query instance all the time, we'd get an infinite loop.

##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,69 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  /**
+   * merge its overlapping ranges and return a simpler but slightly different form by calling {@link
+   * #mergeOverlappingRanges}
+   */
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+    if (numDims != 1) {
+      return this;
+    }
+    List<RangeClause> mergedRanges = mergeOverlappingRanges(rangeClauses, bytesPerDim);
+    if (mergedRanges != rangeClauses) {
+      return new MultiRangeQuery(field, numDims, bytesPerDim, mergedRanges) {
+        @Override
+        protected String toString(int dimension, byte[] value) {
+          return MultiRangeQuery.this.toString(dimension, value);
+        }
+      };
+    } else {
+      return this;
+    }
+  }
+
+  /** merge overlapping ranges to some unconnected ranges */
+  public static List<RangeClause> mergeOverlappingRanges(

Review comment:
       Does it need to be public?

##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -314,6 +386,32 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
       public boolean isCacheable(LeafReaderContext ctx) {
         return true;
       }
+
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        if (numDims != 1 || context.reader().hasDeletions() == true) {
+          return super.count(context);
+        }
+        List<RangeClause> mergeRangeClause = mergeOverlappingRanges(rangeClauses, bytesPerDim);
+        int total = 0;
+        for (RangeClause rangeClause : mergeRangeClause) {
+          PointRangeQuery pointRangeQuery =
+              new PointRangeQuery(field, rangeClause.lowerValue, rangeClause.upperValue, numDims) {
+                @Override
+                protected String toString(int dimension, byte[] value) {
+                  return MultiRangeQuery.this.toString(dimension, value);
+                }
+              };
+          int count = pointRangeQuery.createWeight(searcher, scoreMode, boost).count(context);

Review comment:
       Summing up counts across ranges is only correct on single-valued fields. We need to check that `pointValues.size() == pointValues.docCount()`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wjp719 commented on pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
wjp719 commented on pull request #731:
URL: https://github.com/apache/lucene/pull/731#issuecomment-1084419272


   @jpountz I have added the CHANGES log, thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gautamworah96 commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
gautamworah96 commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r833853974



##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,67 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {
+    LongPointMultiRangeBuilder builder = new LongPointMultiRangeBuilder("ff", 1);
+    builder.add(new long[] {2}, new long[] {3});
+    builder.add(new long[] {3}, new long[] {4});
+    builder.add(new long[] {1}, new long[] {5});
+    builder.add(new long[] {8}, new long[] {10});
+    builder.add(new long[] {3}, new long[] {3});
+    builder.add(new long[] {1}, new long[] {6});
+    MultiRangeQuery multiRangeQuery = builder.build();
+    Query query = multiRangeQuery.rewrite(null);
+    Assert.assertNotEquals(query, multiRangeQuery);

Review comment:
       Plain `assertNotEquals` should work here (just like L764). Same for the `Assert.asserEquals` call




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gautamworah96 commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
gautamworah96 commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r833834846



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -314,6 +370,32 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
       public boolean isCacheable(LeafReaderContext ctx) {
         return true;
       }
+
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        if (numDims != 1) {
+          return super.count(context);
+        }
+        List<RangeClause> mergeRanceClause = mergeOverlappingRanges();

Review comment:
       `mergeRanceClause` -> `mergeRangeClause`

##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,67 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {
+    LongPointMultiRangeBuilder builder = new LongPointMultiRangeBuilder("ff", 1);
+    builder.add(new long[] {2}, new long[] {3});
+    builder.add(new long[] {3}, new long[] {4});
+    builder.add(new long[] {1}, new long[] {5});
+    builder.add(new long[] {8}, new long[] {10});
+    builder.add(new long[] {3}, new long[] {3});
+    builder.add(new long[] {1}, new long[] {6});
+    MultiRangeQuery multiRangeQuery = builder.build();
+    Query query = multiRangeQuery.rewrite(null);
+    Assert.assertNotEquals(query, multiRangeQuery);
+    System.out.println(query.toString());

Review comment:
       Extra debug statement?

##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,60 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {

Review comment:
       We are introducing some new (useful) functionality here (which is unrelated to the one discussed in the JIRA), let's try to make sure that we communicate this to our users. 
   
   Can you add some javadoc here stating that calling rewrite on a `MultiRangeQuery` will merge its overlapping ranges and return a simpler but slightly different form. 
   
   
   

##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,67 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {
+    LongPointMultiRangeBuilder builder = new LongPointMultiRangeBuilder("ff", 1);
+    builder.add(new long[] {2}, new long[] {3});
+    builder.add(new long[] {3}, new long[] {4});
+    builder.add(new long[] {1}, new long[] {5});
+    builder.add(new long[] {8}, new long[] {10});
+    builder.add(new long[] {3}, new long[] {3});
+    builder.add(new long[] {1}, new long[] {6});
+    MultiRangeQuery multiRangeQuery = builder.build();
+    Query query = multiRangeQuery.rewrite(null);
+    Assert.assertNotEquals(query, multiRangeQuery);

Review comment:
       Plain `assertNotEquals` should work here (just like L764).

##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,67 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {

Review comment:
       I think we can write some better tests here. Let's reuse the entire logic of adding new docs, creating new queries from `testOneDimensionCount` and refactor it to a small function. Reuse that function here and then just plainly assert that the rewritten query matches same num of docs as the original query.

##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,60 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+    List<RangeClause> mergedRanges = mergeOverlappingRanges();
+    if (mergedRanges != rangeClauses) {
+      return new MultiRangeQuery(field, numDims, bytesPerDim, mergedRanges) {
+        @Override
+        protected String toString(int dimension, byte[] value) {
+          return MultiRangeQuery.this.toString(dimension, value);
+        }
+      };
+    } else {
+      return this;
+    }
+  }
+
+  private List<RangeClause> mergeOverlappingRanges() {
+    if (numDims != 1 || rangeClauses.size() <= 1) {
+      return rangeClauses;
+    }
+    List<RangeClause> originRangeClause = new ArrayList<>(rangeClauses);
+    final ArrayUtil.ByteArrayComparator comparator = ArrayUtil.getUnsignedComparator(bytesPerDim);
+    originRangeClause.sort(
+        new Comparator<RangeClause>() {
+          @Override
+          public int compare(RangeClause o1, RangeClause o2) {
+            int result = comparator.compare(o1.lowerValue, 0, o2.lowerValue, 0);
+            if (result == 0) {
+              return comparator.compare(o1.upperValue, 0, o2.upperValue, 0);
+            } else {
+              return result;
+            }
+          }
+        });
+    List<RangeClause> finalRangeClause = new ArrayList<>();
+    RangeClause current = originRangeClause.get(0);
+    for (int i = 1; i < originRangeClause.size(); i++) {
+      RangeClause nextClause = originRangeClause.get(i);
+      if (comparator.compare(nextClause.lowerValue, 0, current.upperValue, 0) > 0) {
+        finalRangeClause.add(current);
+        current = nextClause;
+      } else {
+        if (comparator.compare(nextClause.upperValue, 0, current.upperValue, 0) > 0) {
+          current = new RangeClause(current.lowerValue, nextClause.upperValue);
+        }
+      }
+    }
+    finalRangeClause.add(current);
+    if (finalRangeClause.size() != rangeClauses.size()) {

Review comment:
       Let's directly return `finalRangeClause` here?

##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,60 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+    List<RangeClause> mergedRanges = mergeOverlappingRanges();
+    if (mergedRanges != rangeClauses) {
+      return new MultiRangeQuery(field, numDims, bytesPerDim, mergedRanges) {
+        @Override
+        protected String toString(int dimension, byte[] value) {
+          return MultiRangeQuery.this.toString(dimension, value);
+        }
+      };
+    } else {
+      return this;
+    }
+  }
+
+  private List<RangeClause> mergeOverlappingRanges() {

Review comment:
       This function feels like somewhat of an interview question. Which means this should probably not be something we should be implementing ourselves/we may have already done it somewhere. I could not find any other place where we have solved this problem though (I searched in places where we create `Comparators` in the codebase)!
   
    I think in the future, some other features could also use this utility? How about turning this method to `static`, adding some javadoc and taking `rangeClauses` as input?

##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,67 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {
+    LongPointMultiRangeBuilder builder = new LongPointMultiRangeBuilder("ff", 1);
+    builder.add(new long[] {2}, new long[] {3});
+    builder.add(new long[] {3}, new long[] {4});
+    builder.add(new long[] {1}, new long[] {5});
+    builder.add(new long[] {8}, new long[] {10});
+    builder.add(new long[] {3}, new long[] {3});
+    builder.add(new long[] {1}, new long[] {6});
+    MultiRangeQuery multiRangeQuery = builder.build();
+    Query query = multiRangeQuery.rewrite(null);
+    Assert.assertNotEquals(query, multiRangeQuery);
+    System.out.println(query.toString());
+    Query query1 = query.rewrite(null);
+    Assert.assertEquals(query1, query);
+  }
+
+  public void testOneDimensionCount() throws IOException {
+    Directory dir = newDirectory();
+    RandomIndexWriter w = new RandomIndexWriter(random(), dir);
+    Random random = random();
+    int dims = 1;
+
+    for (int i = 0; i < random.nextInt(100, 500); i++) {
+      int numPoints = RandomNumbers.randomIntBetween(random(), 1, 200);
+      long value = RandomNumbers.randomLongBetween(random(), 0, 2000);
+      for (int j = 0; j < numPoints; j++) {
+        Document doc = new Document();
+        doc.add(new LongPoint("point", value));
+        w.addDocument(doc);
+      }
+    }
+    w.flush();
+    w.forceMerge(1);
+    IndexReader reader = w.getReader();
+    IndexSearcher searcher = newSearcher(reader);
+
+    for (int n = 0; n < 100; n++) {
+      int numRanges = RandomNumbers.randomIntBetween(random(), 1, 20);
+      LongPointMultiRangeBuilder builder1 = new LongPointMultiRangeBuilder("point", dims);
+      BooleanQuery.Builder builder2 = new BooleanQuery.Builder();
+      for (int i = 0; i < numRanges; i++) {
+        long[] lower = new long[dims];
+        long[] upper = new long[dims];
+        for (int j = 0; j < dims; j++) {
+          lower[j] = RandomNumbers.randomLongBetween(random(), 0, 2000);
+          upper[j] = lower[j] + RandomNumbers.randomLongBetween(random(), 0, 2000);
+        }
+        builder1.add(lower, upper);
+        builder2.add(LongPoint.newRangeQuery("point", lower, upper), BooleanClause.Occur.SHOULD);
+      }
+
+      MultiRangeQuery multiRangeQuery = builder1.build();
+      BooleanQuery booleanQuery = builder2.build();
+      int count =
+          multiRangeQuery
+              .createWeight(searcher, ScoreMode.COMPLETE, 1.0f)
+              .count(searcher.getLeafContexts().get(0));
+      int booleanCount = searcher.count(booleanQuery);
+      assertEquals(booleanCount, count);

Review comment:
       The flow of this function is smooth!

##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -314,6 +370,32 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
       public boolean isCacheable(LeafReaderContext ctx) {
         return true;
       }
+
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        if (numDims != 1) {

Review comment:
       We also have to check that the `context.reader()` has no deletions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wjp719 commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
wjp719 commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r838083031



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,76 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  /**
+   * Merges the overlapping ranges and returns unconnected ranges by calling {@link
+   * #mergeOverlappingRanges}
+   */
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+    if (numDims != 1) {
+      return this;
+    }
+    List<RangeClause> mergedRanges = mergeOverlappingRanges(rangeClauses, bytesPerDim);
+    if (mergedRanges != rangeClauses) {
+      return new MultiRangeQuery(field, numDims, bytesPerDim, mergedRanges) {
+        @Override
+        protected String toString(int dimension, byte[] value) {
+          return MultiRangeQuery.this.toString(dimension, value);
+        }
+      };

Review comment:
       I have submit a [pr](https://github.com/apache/lucene/pull/774) make Query implement Clonable. If that pr is OK, I will refactor this rewrite method based on clone, is it OK?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wjp719 commented on pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
wjp719 commented on pull request #731:
URL: https://github.com/apache/lucene/pull/731#issuecomment-1082583137


   @jpountz Hi,  I have refactor the code according to your suggestions, please review it again, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wjp719 commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
wjp719 commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r836366764



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,60 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+    List<RangeClause> mergedRanges = mergeOverlappingRanges();
+    if (mergedRanges != rangeClauses) {
+      return new MultiRangeQuery(field, numDims, bytesPerDim, mergedRanges) {
+        @Override
+        protected String toString(int dimension, byte[] value) {
+          return MultiRangeQuery.this.toString(dimension, value);
+        }
+      };
+    } else {
+      return this;
+    }
+  }
+
+  private List<RangeClause> mergeOverlappingRanges() {

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wjp719 commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
wjp719 commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r838048975



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,69 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  /**
+   * merge its overlapping ranges and return a simpler but slightly different form by calling {@link
+   * #mergeOverlappingRanges}
+   */
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+    if (numDims != 1) {
+      return this;
+    }
+    List<RangeClause> mergedRanges = mergeOverlappingRanges(rangeClauses, bytesPerDim);
+    if (mergedRanges != rangeClauses) {
+      return new MultiRangeQuery(field, numDims, bytesPerDim, mergedRanges) {
+        @Override
+        protected String toString(int dimension, byte[] value) {
+          return MultiRangeQuery.this.toString(dimension, value);
+        }
+      };
+    } else {
+      return this;
+    }
+  }
+
+  /** merge overlapping ranges to some unconnected ranges */
+  public static List<RangeClause> mergeOverlappingRanges(

Review comment:
       the mergeOverlappingRanges function maybe a common function and other places can use it. I remove the **public**, so that the same package can reuse it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wjp719 commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
wjp719 commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r838428748



##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +763,90 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  private void addRandomDocs(RandomIndexWriter w) throws IOException {
+    Random random = random();
+    for (int i = 0; i < random.nextInt(100, 500); i++) {

Review comment:
       done

##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +763,90 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  private void addRandomDocs(RandomIndexWriter w) throws IOException {
+    Random random = random();
+    for (int i = 0; i < random.nextInt(100, 500); i++) {
+      int numPoints = RandomNumbers.randomIntBetween(random(), 1, 200);
+      long value = RandomNumbers.randomLongBetween(random(), 0, 2000);
+      for (int j = 0; j < numPoints; j++) {
+        Document doc = new Document();
+        doc.add(new LongPoint("point", value));
+        w.addDocument(doc);
+      }
+    }
+    w.flush();
+    w.forceMerge(1);
+  }
+
+  /** The hit doc count of the rewritten query should be the same as origin query's */
+  public void testRandomRewrite() throws IOException {
+    Directory dir = newDirectory();
+    RandomIndexWriter w = new RandomIndexWriter(random(), dir);
+    int dims = 1;
+    addRandomDocs(w);
+    Random random = random();
+
+    IndexReader reader = w.getReader();
+    IndexSearcher searcher = newSearcher(reader);
+    int numIters = random.nextInt(200);

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] madrob commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
madrob commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r838617372



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -46,7 +48,7 @@
  *
  * @lucene.experimental
  */
-public abstract class MultiRangeQuery extends Query {
+public abstract class MultiRangeQuery extends Query implements Cloneable {

Review comment:
       Please update the javadoc on this class to remove the TODO since you are already implementing that functionality.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gautamworah96 commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
gautamworah96 commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r837090466



##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,103 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {
+    LongPointMultiRangeBuilder builder = new LongPointMultiRangeBuilder("ff", 1);
+    builder.add(new long[] {2}, new long[] {3});
+    builder.add(new long[] {3}, new long[] {4});
+    builder.add(new long[] {1}, new long[] {5});
+    builder.add(new long[] {8}, new long[] {10});
+    builder.add(new long[] {3}, new long[] {3});
+    builder.add(new long[] {1}, new long[] {6});
+    MultiRangeQuery multiRangeQuery = builder.build();
+    Query query = multiRangeQuery.rewrite(null);
+    assertNotEquals(query, multiRangeQuery);
+    Query query1 = query.rewrite(null);
+    Assert.assertEquals(query1, query);
+  }
+
+  public void testRandomRewrite() throws IOException {

Review comment:
       `test rewrite query. the hit doc count of the rewritten query should be the same as origin query's.` -> `The hit doc count of the rewritten query should be the same as origin query's.`

##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,69 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  /**
+   * merge its overlapping ranges and return a simpler but slightly different form by calling {@link
+   * #mergeOverlappingRanges}
+   */
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+    if (numDims != 1) {
+      return this;
+    }
+    List<RangeClause> mergedRanges = mergeOverlappingRanges(rangeClauses, bytesPerDim);
+    if (mergedRanges != rangeClauses) {
+      return new MultiRangeQuery(field, numDims, bytesPerDim, mergedRanges) {
+        @Override
+        protected String toString(int dimension, byte[] value) {
+          return MultiRangeQuery.this.toString(dimension, value);
+        }
+      };
+    } else {
+      return this;
+    }
+  }
+
+  /** merge overlapping ranges to some unconnected ranges */
+  public static List<RangeClause> mergeOverlappingRanges(

Review comment:
       minor: `merge overlapping ranges to some unconnected ranges` -> `Merges overlapping ranges and returns unconnected ranges`. The `some` word in both the javadoc and the param description is redundant. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wjp719 commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
wjp719 commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r838052200



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,60 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+    List<RangeClause> mergedRanges = mergeOverlappingRanges();
+    if (mergedRanges != rangeClauses) {
+      return new MultiRangeQuery(field, numDims, bytesPerDim, mergedRanges) {
+        @Override
+        protected String toString(int dimension, byte[] value) {
+          return MultiRangeQuery.this.toString(dimension, value);
+        }
+      };
+    } else {
+      return this;
+    }
+  }
+
+  private List<RangeClause> mergeOverlappingRanges() {
+    if (numDims != 1 || rangeClauses.size() <= 1) {
+      return rangeClauses;
+    }
+    List<RangeClause> originRangeClause = new ArrayList<>(rangeClauses);
+    final ArrayUtil.ByteArrayComparator comparator = ArrayUtil.getUnsignedComparator(bytesPerDim);
+    originRangeClause.sort(
+        new Comparator<RangeClause>() {
+          @Override
+          public int compare(RangeClause o1, RangeClause o2) {
+            int result = comparator.compare(o1.lowerValue, 0, o2.lowerValue, 0);
+            if (result == 0) {
+              return comparator.compare(o1.upperValue, 0, o2.upperValue, 0);
+            } else {
+              return result;
+            }
+          }
+        });
+    List<RangeClause> finalRangeClause = new ArrayList<>();
+    RangeClause current = originRangeClause.get(0);
+    for (int i = 1; i < originRangeClause.size(); i++) {
+      RangeClause nextClause = originRangeClause.get(i);
+      if (comparator.compare(nextClause.lowerValue, 0, current.upperValue, 0) > 0) {
+        finalRangeClause.add(current);
+        current = nextClause;
+      } else {
+        if (comparator.compare(nextClause.upperValue, 0, current.upperValue, 0) > 0) {
+          current = new RangeClause(current.lowerValue, nextClause.upperValue);
+        }
+      }
+    }
+    finalRangeClause.add(current);
+    if (finalRangeClause.size() != rangeClauses.size()) {

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wjp719 commented on pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
wjp719 commented on pull request #731:
URL: https://github.com/apache/lucene/pull/731#issuecomment-1081337873


   @gautamworah96 I refactor the code again, please review it again, thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
jpountz commented on pull request #731:
URL: https://github.com/apache/lucene/pull/731#issuecomment-1084332206


   Can you also add a CHANGES entry under version 9.2?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wjp719 commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
wjp719 commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r838661197



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -46,7 +48,7 @@
  *
  * @lucene.experimental
  */
-public abstract class MultiRangeQuery extends Query {
+public abstract class MultiRangeQuery extends Query implements Cloneable {

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
jpountz commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r839388337



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -314,6 +388,36 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
       public boolean isCacheable(LeafReaderContext ctx) {
         return true;
       }
+
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        if (numDims != 1 || context.reader().hasDeletions() == true) {
+          return super.count(context);
+        }
+        PointValues pointValues = context.reader().getPointValues(field);
+        if (pointValues == null || pointValues.size() != pointValues.getDocCount()) {
+          return super.count(context);
+        }
+        List<RangeClause> mergeRangeClause = mergeOverlappingRanges(rangeClauses, bytesPerDim);
+        int total = 0;
+        for (RangeClause rangeClause : mergeRangeClause) {
+          PointRangeQuery pointRangeQuery =
+              new PointRangeQuery(field, rangeClause.lowerValue, rangeClause.upperValue, numDims) {
+                @Override
+                protected String toString(int dimension, byte[] value) {
+                  return MultiRangeQuery.this.toString(dimension, value);
+                }
+              };
+          int count = pointRangeQuery.createWeight(searcher, scoreMode, boost).count(context);

Review comment:
       nit: I would not propagate the score mode since we know we do not need it here
   
   ```suggestion
             int count = pointRangeQuery.createWeight(searcher, ScoreMode.COMPLETE_NO_SCORES, 1f).count(context);
   ```

##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -314,6 +388,36 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
       public boolean isCacheable(LeafReaderContext ctx) {
         return true;
       }
+
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        if (numDims != 1 || context.reader().hasDeletions() == true) {
+          return super.count(context);
+        }
+        PointValues pointValues = context.reader().getPointValues(field);
+        if (pointValues == null || pointValues.size() != pointValues.getDocCount()) {
+          return super.count(context);
+        }
+        List<RangeClause> mergeRangeClause = mergeOverlappingRanges(rangeClauses, bytesPerDim);

Review comment:
       We don't need this, it is required to call `rewrite` before creating a Weight, the ranges will already be merged when we reach this method.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wjp719 commented on pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
wjp719 commented on pull request #731:
URL: https://github.com/apache/lucene/pull/731#issuecomment-1068122549


   @iverase Hi, can you help to review this pr, thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wjp719 commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
wjp719 commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r836331353



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,60 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+    List<RangeClause> mergedRanges = mergeOverlappingRanges();
+    if (mergedRanges != rangeClauses) {
+      return new MultiRangeQuery(field, numDims, bytesPerDim, mergedRanges) {
+        @Override
+        protected String toString(int dimension, byte[] value) {
+          return MultiRangeQuery.this.toString(dimension, value);
+        }
+      };
+    } else {
+      return this;
+    }
+  }
+
+  private List<RangeClause> mergeOverlappingRanges() {
+    if (numDims != 1 || rangeClauses.size() <= 1) {
+      return rangeClauses;
+    }
+    List<RangeClause> originRangeClause = new ArrayList<>(rangeClauses);
+    final ArrayUtil.ByteArrayComparator comparator = ArrayUtil.getUnsignedComparator(bytesPerDim);
+    originRangeClause.sort(
+        new Comparator<RangeClause>() {
+          @Override
+          public int compare(RangeClause o1, RangeClause o2) {
+            int result = comparator.compare(o1.lowerValue, 0, o2.lowerValue, 0);
+            if (result == 0) {
+              return comparator.compare(o1.upperValue, 0, o2.upperValue, 0);
+            } else {
+              return result;
+            }
+          }
+        });
+    List<RangeClause> finalRangeClause = new ArrayList<>();
+    RangeClause current = originRangeClause.get(0);
+    for (int i = 1; i < originRangeClause.size(); i++) {
+      RangeClause nextClause = originRangeClause.get(i);
+      if (comparator.compare(nextClause.lowerValue, 0, current.upperValue, 0) > 0) {
+        finalRangeClause.add(current);
+        current = nextClause;
+      } else {
+        if (comparator.compare(nextClause.upperValue, 0, current.upperValue, 0) > 0) {
+          current = new RangeClause(current.lowerValue, nextClause.upperValue);
+        }
+      }
+    }
+    finalRangeClause.add(current);
+    if (finalRangeClause.size() != rangeClauses.size()) {

Review comment:
       in rewrite() method, I compare the origin rangeClauses with the output finalRangeClause, if they are the same, the rewrite() method returns the origin query




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wjp719 commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
wjp719 commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r836366205



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -314,6 +370,32 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
       public boolean isCacheable(LeafReaderContext ctx) {
         return true;
       }
+
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        if (numDims != 1) {

Review comment:
       done

##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,67 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {

Review comment:
       done

##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,67 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {
+    LongPointMultiRangeBuilder builder = new LongPointMultiRangeBuilder("ff", 1);
+    builder.add(new long[] {2}, new long[] {3});
+    builder.add(new long[] {3}, new long[] {4});
+    builder.add(new long[] {1}, new long[] {5});
+    builder.add(new long[] {8}, new long[] {10});
+    builder.add(new long[] {3}, new long[] {3});
+    builder.add(new long[] {1}, new long[] {6});
+    MultiRangeQuery multiRangeQuery = builder.build();
+    Query query = multiRangeQuery.rewrite(null);
+    Assert.assertNotEquals(query, multiRangeQuery);

Review comment:
       done

##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,60 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wjp719 commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
wjp719 commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r836367709



##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestMultiRangeQueries.java
##########
@@ -761,4 +764,67 @@ public void testEqualsAndHashCode() {
       assertNotEquals(query1.hashCode(), query3.hashCode());
     }
   }
+
+  public void testRewrite() throws IOException {
+    LongPointMultiRangeBuilder builder = new LongPointMultiRangeBuilder("ff", 1);
+    builder.add(new long[] {2}, new long[] {3});
+    builder.add(new long[] {3}, new long[] {4});
+    builder.add(new long[] {1}, new long[] {5});
+    builder.add(new long[] {8}, new long[] {10});
+    builder.add(new long[] {3}, new long[] {3});
+    builder.add(new long[] {1}, new long[] {6});
+    MultiRangeQuery multiRangeQuery = builder.build();
+    Query query = multiRangeQuery.rewrite(null);
+    Assert.assertNotEquals(query, multiRangeQuery);
+    System.out.println(query.toString());

Review comment:
       I have removed the println() function here

##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -314,6 +370,32 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
       public boolean isCacheable(LeafReaderContext ctx) {
         return true;
       }
+
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        if (numDims != 1) {
+          return super.count(context);
+        }
+        List<RangeClause> mergeRanceClause = mergeOverlappingRanges();

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wjp719 commented on a change in pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
wjp719 commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r838052074



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -314,6 +386,32 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
       public boolean isCacheable(LeafReaderContext ctx) {
         return true;
       }
+
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        if (numDims != 1 || context.reader().hasDeletions() == true) {
+          return super.count(context);
+        }
+        List<RangeClause> mergeRangeClause = mergeOverlappingRanges(rangeClauses, bytesPerDim);
+        int total = 0;
+        for (RangeClause rangeClause : mergeRangeClause) {
+          PointRangeQuery pointRangeQuery =
+              new PointRangeQuery(field, rangeClause.lowerValue, rangeClause.upperValue, numDims) {
+                @Override
+                protected String toString(int dimension, byte[] value) {
+                  return MultiRangeQuery.this.toString(dimension, value);
+                }
+              };
+          int count = pointRangeQuery.createWeight(searcher, scoreMode, boost).count(context);

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wjp719 commented on pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

Posted by GitBox <gi...@apache.org>.
wjp719 commented on pull request #731:
URL: https://github.com/apache/lucene/pull/731#issuecomment-1083012369


   @jpountz I refactor the test cases, please help to review again, thanks a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org