You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/02/07 21:08:53 UTC

[GitHub] [lucene-solr] dnhatn opened a new pull request #2317: LUCENE-9741: Add sequential optimization for stored fields

dnhatn opened a new pull request #2317:
URL: https://github.com/apache/lucene-solr/pull/2317


   If we are reading the stored-fields of document ids (25, 27, 28, 26, 99), and doc-25 triggers the stored-fields reader to decompress a block containing document ids [10-50], then we can tell the reader to read not only 25, but 26, 27, and 28 to avoid decompressing that block multiple times.
   
   
   This PR proposes adding a new optimized instance of stored-fields reader that allows users to select the preferred fetching range.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #2317: LUCENE-9741: Add sequential optimization for stored fields

Posted by GitBox <gi...@apache.org>.
muse-dev[bot] commented on a change in pull request #2317:
URL: https://github.com/apache/lucene-solr/pull/2317#discussion_r572201047



##########
File path: lucene/test-framework/src/java/org/apache/lucene/index/BaseStoredFieldsFormatTestCase.java
##########
@@ -839,4 +839,41 @@ public void testMismatchedFields() throws Exception {
     IOUtils.close(iw, ir, everything);
     IOUtils.close(dirs);
   }
+
+  public void testPrefetch() throws Exception {
+    Directory dir = newDirectory();
+    IndexWriterConfig config = newIndexWriterConfig().setCodec(getCodec());
+    IndexWriter writer = new IndexWriter(dir, config);
+    int numDocs = atLeast(100);
+    Map<Integer, Document> docs = new HashMap<>();
+    for (int i = 0; i < numDocs; i++) {
+      Document doc = new Document();
+      int numFields = random().nextInt(rarely() ? 1000 : 100);
+      doc.add(new StringField("num_fields", Integer.toString(numFields), Store.NO));
+      for (int f = 0; f < numFields; f++) {
+        String str = "doc=" + i + "f=" + f;
+        doc.add(new StringField("field-" + f, str, Store.YES));
+      }
+      writer.addDocument(doc);
+      docs.put(i, doc);
+    }
+    final DirectoryReader reader =
+        new RandomPrefetchStoredFieldsCodecDirectoryReader(DirectoryReader.open(writer));
+    int iters = atLeast(100);
+    for (int i = 0; i < iters; i++) {
+      int docId = random().nextInt(numDocs);
+      final IndexableField numFieldStr = docs.get(docId).getField("num_fields");
+      assertNotNull(numFieldStr);
+      int numFields = Integer.parseInt(numFieldStr.stringValue());

Review comment:
       *NULL_DEREFERENCE:*  object `numFieldStr` last assigned on line 865 could be null and is dereferenced at line 867.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dnhatn closed pull request #2317: LUCENE-9741: Add sequential optimization for stored fields

Posted by GitBox <gi...@apache.org>.
dnhatn closed pull request #2317:
URL: https://github.com/apache/lucene-solr/pull/2317


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #2317: LUCENE-9741: Add sequential optimization for stored fields

Posted by GitBox <gi...@apache.org>.
muse-dev[bot] commented on a change in pull request #2317:
URL: https://github.com/apache/lucene-solr/pull/2317#discussion_r571784231



##########
File path: lucene/test-framework/src/java/org/apache/lucene/index/BaseStoredFieldsFormatTestCase.java
##########
@@ -839,4 +839,38 @@ public void testMismatchedFields() throws Exception {
     IOUtils.close(iw, ir, everything);
     IOUtils.close(dirs);
   }
+
+  public void testPrefetch() throws Exception {
+    Directory dir = newDirectory();
+    IndexWriterConfig config = newIndexWriterConfig().setCodec(getCodec());
+    IndexWriter writer = new IndexWriter(dir, config);
+    int numDocs = atLeast(100);
+    Map<Integer, Document> docs = new HashMap<>();
+    for (int i = 0; i < numDocs; i++) {
+      Document doc = new Document();
+      int numFields = random().nextInt(rarely() ? 1000 : 100);
+      doc.add(new StringField("num_fields", Integer.toString(numFields), Store.NO));
+      for (int f = 0; f < numFields; f++) {
+        String str = "doc=" + i + "f=" + f;
+        doc.add(new StringField("field-" + f, str, Store.YES));
+      }
+      writer.addDocument(doc);
+      docs.put(i, doc);
+    }
+    final DirectoryReader reader = new RandomPrefetchStoredFieldsCodecDirectoryReader(DirectoryReader.open(writer));
+    int iters = atLeast(100);
+    for (int i = 0; i < iters; i++) {
+      int docId = random().nextInt(numDocs);
+      int numFields = Integer.parseInt(docs.get(docId).getField("num_fields").stringValue());

Review comment:
       *NULL_DEREFERENCE:*  object returned by `docs.get(valueOf(docId)).getField("num_fields")` could be null and is dereferenced at line 864.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dnhatn commented on pull request #2317: LUCENE-9741: Add sequential optimization for stored fields

Posted by GitBox <gi...@apache.org>.
dnhatn commented on pull request #2317:
URL: https://github.com/apache/lucene-solr/pull/2317#issuecomment-780259815


   @jimczi Thank you for reviewing. I am closing this PR because the proposed API is too complex.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org