You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/10/26 14:56:02 UTC

[GitHub] [lucene] nik9000 opened a new pull request #415: LUCENE-10206 Implement O(1) count on query cache

nik9000 opened a new pull request #415:
URL: https://github.com/apache/lucene/pull/415


   # Description
   
   When we load a query into the query cache we always calculate the count
   of matching documents. This uses that count to power the new `O(1)`
   `Weight#count` method.
   
   # Solution
   
   I've tried a bunch of approaches but settled on opening this PR with the simplest one - add a new class that keeps the BitSet and the count. I'm not particularly tied to it other than that it is fairly simple.
   
   I am assuming it's right to try and implement `count` here rather than do something else. It feels like that method was made for situations like this though.
   
   # Tests
   
   I've added some to LRUCache's unit test. I haven't done any performance testing here. A little because "it's obvious that returning a number is faster than counting stuff". Nanoseconds vs microseconds. But I'd love to do more with this if folks want me to.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request title.
   - [ ] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `main` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a change in pull request #415: LUCENE-10206 Implement O(1) count on query cache

Posted by GitBox <gi...@apache.org>.
jpountz commented on a change in pull request #415:
URL: https://github.com/apache/lucene/pull/415#discussion_r736682046



##########
File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java
##########
@@ -1976,4 +1989,60 @@ public void testSkipCachingForTermQuery() throws IOException {
     reader.close();
     dir.close();
   }
+
+  public void testCacheHasFastCount() throws IOException {
+    Query query = new PhraseQuery("words", new BytesRef("alice"), new BytesRef("ran"));
+
+    Directory dir = newDirectory();
+    RandomIndexWriter w =
+        new RandomIndexWriter(
+            random(), dir, newIndexWriterConfig().setMergePolicy(NoMergePolicy.INSTANCE));
+    Document doc1 = new Document();
+    doc1.add(new TextField("words", "tom ran", Store.NO));
+    Document doc2 = new Document();
+    doc2.add(new TextField("words", "alice ran", Store.NO));
+    doc2.add(new StringField("f", "a", Store.NO));
+    Document doc3 = new Document();
+    doc3.add(new TextField("words", "alice ran", Store.NO));
+    doc3.add(new StringField("f", "b", Store.NO));
+    w.addDocuments(Arrays.asList(doc1, doc2, doc3));
+
+    try (IndexReader reader = w.getReader()) {
+      IndexSearcher searcher = newSearcher(reader);
+      searcher.setQueryCachingPolicy(ALWAYS_CACHE);
+      LRUQueryCache allCache =
+          new LRUQueryCache(1000000, 10000000, context -> true, Float.POSITIVE_INFINITY);
+      searcher.setQueryCache(allCache);
+      Weight weight = searcher.createWeight(query, ScoreMode.COMPLETE_NO_SCORES, 1);
+      LeafReaderContext context = getOnlyLeafReader(reader).getContext();
+      // We don't have a fast count before the cache is filled
+      assertEquals(weight.count(context), -1);
+      // Fetch the scorer to populate the cache
+      weight.scorer(context);
+      assertEquals(List.of(query), allCache.cachedQueries());
+      // Now we *do* have a fast count
+      assertEquals(weight.count(context), 2);
+    }
+
+    w.deleteDocuments(new TermQuery(new Term("f", new BytesRef("b"))));
+    try (IndexReader reader = w.getReader()) {
+      IndexSearcher searcher = newSearcher(reader);
+      searcher.setQueryCachingPolicy(ALWAYS_CACHE);
+      LRUQueryCache allCache =
+          new LRUQueryCache(1000000, 10000000, context -> true, Float.POSITIVE_INFINITY);
+      searcher.setQueryCache(allCache);
+      Weight weight = searcher.createWeight(query, ScoreMode.COMPLETE_NO_SCORES, 1);
+      LeafReaderContext context = getOnlyLeafReader(reader).getContext();
+      // We don't have a fast count before the cache is filled
+      assertEquals(weight.count(context), -1);
+      // Fetch the scorer to populate the cache
+      weight.scorer(context);
+      assertEquals(List.of(query), allCache.cachedQueries());
+      // We still don't have a fast count because we have deleted documents
+      assertEquals(weight.count(context), -1);

Review comment:
       nit: assertEquals expects the expected value first, so that we get better error messages in case of failure

##########
File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java
##########
@@ -1088,7 +1089,19 @@ public void testRandom() throws IOException {
         cachedSearcher.setQueryCachingPolicy(ALWAYS_CACHE);
       }
       final Query q = buildRandomQuery(0);
+      /*
+       * Counts are the same. If the query has already been cached
+       * this'll use the O(1) Weight#count method.
+       */
       assertEquals(uncachedSearcher.count(q), cachedSearcher.count(q));
+      /*
+       * Just to make sure we can iterate every time also check that the
+       * same docs are returned in the same order.
+       */
+      int size = 1 + random().nextInt(1000);
+      assertArrayEquals(
+          Arrays.stream(uncachedSearcher.search(q, size).scoreDocs).mapToInt(d -> d.doc).toArray(),
+          Arrays.stream(cachedSearcher.search(q, size).scoreDocs).mapToInt(d -> d.doc).toArray());

Review comment:
       you might want to use `CheckHits#checkEqual`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] nik9000 commented on pull request #415: LUCENE-10206 Implement O(1) count on query cache

Posted by GitBox <gi...@apache.org>.
nik9000 commented on pull request #415:
URL: https://github.com/apache/lucene/pull/415#issuecomment-952923025


   > jpountz merged commit 941df98 into apache:main 5 hours ago
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] nik9000 commented on pull request #415: LUCENE-10206 Implement O(1) count on query cache

Posted by GitBox <gi...@apache.org>.
nik9000 commented on pull request #415:
URL: https://github.com/apache/lucene/pull/415#issuecomment-952040157


   > I had an out of date `main`. I'll update.
   
   Done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] nik9000 commented on a change in pull request #415: LUCENE-10206 Implement O(1) count on query cache

Posted by GitBox <gi...@apache.org>.
nik9000 commented on a change in pull request #415:
URL: https://github.com/apache/lucene/pull/415#discussion_r736693202



##########
File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java
##########
@@ -1976,4 +1989,60 @@ public void testSkipCachingForTermQuery() throws IOException {
     reader.close();
     dir.close();
   }
+
+  public void testCacheHasFastCount() throws IOException {
+    Query query = new PhraseQuery("words", new BytesRef("alice"), new BytesRef("ran"));
+
+    Directory dir = newDirectory();
+    RandomIndexWriter w =
+        new RandomIndexWriter(
+            random(), dir, newIndexWriterConfig().setMergePolicy(NoMergePolicy.INSTANCE));
+    Document doc1 = new Document();
+    doc1.add(new TextField("words", "tom ran", Store.NO));
+    Document doc2 = new Document();
+    doc2.add(new TextField("words", "alice ran", Store.NO));
+    doc2.add(new StringField("f", "a", Store.NO));
+    Document doc3 = new Document();
+    doc3.add(new TextField("words", "alice ran", Store.NO));
+    doc3.add(new StringField("f", "b", Store.NO));
+    w.addDocuments(Arrays.asList(doc1, doc2, doc3));
+
+    try (IndexReader reader = w.getReader()) {
+      IndexSearcher searcher = newSearcher(reader);
+      searcher.setQueryCachingPolicy(ALWAYS_CACHE);
+      LRUQueryCache allCache =
+          new LRUQueryCache(1000000, 10000000, context -> true, Float.POSITIVE_INFINITY);
+      searcher.setQueryCache(allCache);
+      Weight weight = searcher.createWeight(query, ScoreMode.COMPLETE_NO_SCORES, 1);
+      LeafReaderContext context = getOnlyLeafReader(reader).getContext();
+      // We don't have a fast count before the cache is filled
+      assertEquals(weight.count(context), -1);
+      // Fetch the scorer to populate the cache
+      weight.scorer(context);
+      assertEquals(List.of(query), allCache.cachedQueries());
+      // Now we *do* have a fast count
+      assertEquals(weight.count(context), 2);
+    }
+
+    w.deleteDocuments(new TermQuery(new Term("f", new BytesRef("b"))));
+    try (IndexReader reader = w.getReader()) {
+      IndexSearcher searcher = newSearcher(reader);
+      searcher.setQueryCachingPolicy(ALWAYS_CACHE);
+      LRUQueryCache allCache =
+          new LRUQueryCache(1000000, 10000000, context -> true, Float.POSITIVE_INFINITY);
+      searcher.setQueryCache(allCache);
+      Weight weight = searcher.createWeight(query, ScoreMode.COMPLETE_NO_SCORES, 1);
+      LeafReaderContext context = getOnlyLeafReader(reader).getContext();
+      // We don't have a fast count before the cache is filled
+      assertEquals(weight.count(context), -1);
+      // Fetch the scorer to populate the cache
+      weight.scorer(context);
+      assertEquals(List.of(query), allCache.cachedQueries());
+      // We still don't have a fast count because we have deleted documents
+      assertEquals(weight.count(context), -1);

Review comment:
       Bah! I leave this same comment on other people's code. And yet I make the same mistake. Will fix.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] nik9000 commented on pull request #415: LUCENE-10206 Implement O(1) count on query cache

Posted by GitBox <gi...@apache.org>.
nik9000 commented on pull request #415:
URL: https://github.com/apache/lucene/pull/415#issuecomment-952025788


   > 
   
   I had an out of date `main`. I'll update.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] nik9000 commented on a change in pull request #415: LUCENE-10206 Implement O(1) count on query cache

Posted by GitBox <gi...@apache.org>.
nik9000 commented on a change in pull request #415:
URL: https://github.com/apache/lucene/pull/415#discussion_r736786185



##########
File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java
##########
@@ -1976,4 +1989,60 @@ public void testSkipCachingForTermQuery() throws IOException {
     reader.close();
     dir.close();
   }
+
+  public void testCacheHasFastCount() throws IOException {
+    Query query = new PhraseQuery("words", new BytesRef("alice"), new BytesRef("ran"));
+
+    Directory dir = newDirectory();
+    RandomIndexWriter w =
+        new RandomIndexWriter(
+            random(), dir, newIndexWriterConfig().setMergePolicy(NoMergePolicy.INSTANCE));
+    Document doc1 = new Document();
+    doc1.add(new TextField("words", "tom ran", Store.NO));
+    Document doc2 = new Document();
+    doc2.add(new TextField("words", "alice ran", Store.NO));
+    doc2.add(new StringField("f", "a", Store.NO));
+    Document doc3 = new Document();
+    doc3.add(new TextField("words", "alice ran", Store.NO));
+    doc3.add(new StringField("f", "b", Store.NO));
+    w.addDocuments(Arrays.asList(doc1, doc2, doc3));
+
+    try (IndexReader reader = w.getReader()) {
+      IndexSearcher searcher = newSearcher(reader);
+      searcher.setQueryCachingPolicy(ALWAYS_CACHE);
+      LRUQueryCache allCache =
+          new LRUQueryCache(1000000, 10000000, context -> true, Float.POSITIVE_INFINITY);
+      searcher.setQueryCache(allCache);
+      Weight weight = searcher.createWeight(query, ScoreMode.COMPLETE_NO_SCORES, 1);
+      LeafReaderContext context = getOnlyLeafReader(reader).getContext();
+      // We don't have a fast count before the cache is filled
+      assertEquals(weight.count(context), -1);
+      // Fetch the scorer to populate the cache
+      weight.scorer(context);
+      assertEquals(List.of(query), allCache.cachedQueries());
+      // Now we *do* have a fast count
+      assertEquals(weight.count(context), 2);
+    }
+
+    w.deleteDocuments(new TermQuery(new Term("f", new BytesRef("b"))));
+    try (IndexReader reader = w.getReader()) {
+      IndexSearcher searcher = newSearcher(reader);
+      searcher.setQueryCachingPolicy(ALWAYS_CACHE);
+      LRUQueryCache allCache =
+          new LRUQueryCache(1000000, 10000000, context -> true, Float.POSITIVE_INFINITY);
+      searcher.setQueryCache(allCache);
+      Weight weight = searcher.createWeight(query, ScoreMode.COMPLETE_NO_SCORES, 1);
+      LeafReaderContext context = getOnlyLeafReader(reader).getContext();
+      // We don't have a fast count before the cache is filled
+      assertEquals(weight.count(context), -1);
+      // Fetch the scorer to populate the cache
+      weight.scorer(context);
+      assertEquals(List.of(query), allCache.cachedQueries());
+      // We still don't have a fast count because we have deleted documents
+      assertEquals(weight.count(context), -1);

Review comment:
       Done.

##########
File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java
##########
@@ -1088,7 +1089,19 @@ public void testRandom() throws IOException {
         cachedSearcher.setQueryCachingPolicy(ALWAYS_CACHE);
       }
       final Query q = buildRandomQuery(0);
+      /*
+       * Counts are the same. If the query has already been cached
+       * this'll use the O(1) Weight#count method.
+       */
       assertEquals(uncachedSearcher.count(q), cachedSearcher.count(q));
+      /*
+       * Just to make sure we can iterate every time also check that the
+       * same docs are returned in the same order.
+       */
+      int size = 1 + random().nextInt(1000);
+      assertArrayEquals(
+          Arrays.stream(uncachedSearcher.search(q, size).scoreDocs).mapToInt(d -> d.doc).toArray(),
+          Arrays.stream(cachedSearcher.search(q, size).scoreDocs).mapToInt(d -> d.doc).toArray());

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz merged pull request #415: LUCENE-10206 Implement O(1) count on query cache

Posted by GitBox <gi...@apache.org>.
jpountz merged pull request #415:
URL: https://github.com/apache/lucene/pull/415


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org