You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/03/23 14:09:04 UTC

[GitHub] [lucene] mikemccand commented on a change in pull request #32: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

mikemccand commented on a change in pull request #32:
URL: https://github.com/apache/lucene/pull/32#discussion_r599588929



##########
File path: lucene/core/src/java/org/apache/lucene/index/BaseCompositeReader.java
##########
@@ -231,4 +239,13 @@ protected final int readerBase(int readerIndex) {
   protected final List<? extends R> getSequentialSubReaders() {
     return subReadersList;
   }
+
+  /**
+   * Returns a comparator for sorting sub readers.
+   *
+   * @return a comparator for sorting sub readers, or {@code null} if a comparator was not set.
+   */
+  protected Comparator<R> getSubReadersSorter() {
+    return subReadersSorter;

Review comment:
       Hmm, why does the reader need to record its sorter?  A reader is "write once" right?  Once it is created it won't alter is sub-reader order.
   
   Edit: OK I see it's so if we wrap this reader in a `FilterDirectoryReader`, that wrapped reader will also sort its leaves the same way?
   
   Hmm, but wouldn't it already have that sort order since its wrapped reader sorted on construction?
   
   I guess maybe callers might want to "chain" the sorter -- have this reader remember what sorter was used, and on refresh, pull that sorter out and use it again for the next reader, ok.

##########
File path: lucene/core/src/java/org/apache/lucene/index/LiveIndexWriterConfig.java
##########
@@ -91,6 +92,9 @@
   /** The sort order to use to write merged segments. */
   protected Sort indexSort = null;
 
+  /** The comparator for sorting leaf readers. */
+  protected Comparator<LeafReader> leafSorter = null;

Review comment:
       No need for `= null` -- it is default already for java.

##########
File path: lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java
##########
@@ -56,7 +57,24 @@
    * @throws IOException if there is a low-level IO error
    */
   public static DirectoryReader open(final Directory directory) throws IOException {
-    return StandardDirectoryReader.open(directory, null);
+    return StandardDirectoryReader.open(directory, null, null);
+  }
+
+  /**
+   * Returns a IndexReader for the the index in the given Directory
+   *
+   * @param directory the index directory
+   * @param leafSorter a comparator for sorting leaf readers. Providing leafSorter is useful for
+   *     indices on which it is expected to run many queries with particular sort criteria (e.g. for
+   *     time-based indices this is usually a descending sort on timestamp). In this case {@code
+   *     leafSorter} should sort leaves according to this sort criteria. Providing leafSorter allows
+   *     to speed up this particular type of sort queries by early terminating while iterating
+   *     through segments and segments' documents.

Review comment:
       I'm curious how this works in practice.  Since each leaf holds many documents which span a range of times, the leaves might have overlapping time ranges, right?  Especially with multiple threads during indexing sending documents with similar timestamps to different initial segments, and then merging randomizing things (hmm, unless you also use a merge policy that tries to "coalesce" time ranges).

##########
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriterConfig.java
##########
@@ -53,7 +54,7 @@
 public final class IndexWriterConfig extends LiveIndexWriterConfig {
 
   /** Specifies the open mode for {@link IndexWriter}. */
-  public static enum OpenMode {
+  public enum OpenMode {

Review comment:
       Silly java, why doesn't `javac` warn/tell us that `static` is pointless for `enum`?  Hmm, it is pointless, right?
   
   Wow, we have tons of `static enum` in our codebase!  I'll open a separate PR, this is crazy.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org