You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/01/27 21:18:15 UTC

[GitHub] [lucene-solr] mayya-sharipova opened a new pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

mayya-sharipova opened a new pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256


   1. Add an option to supply a custom leaf sorter for IndexWriter.
   A DirectoryReader opened from this IndexWriter will have its leaf
   readers sorted with the provided leaf sorter. This is useful for
   indices on which it is expected to run many queries with particular
   sort criteria (e.g. for time-based indices this is usually a
   descending sort on timestamp). Providing leafSorter allows
   to speed up early termination for this particular type of
   sort queries.
   
   2. Add an option to supply a custom leaf sorter for
   StandardDirectoryReader. The leaf readers of this
   StandardDirectoryReader will be sorted according the the provided leaf
   sorter.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on a change in pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256#discussion_r568520864



##########
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##########
@@ -941,6 +969,11 @@ public IndexWriter(Directory d, IndexWriterConfig conf) throws IOException {
     // obtain the write.lock. If the user configured a timeout,
     // we wrap with a sleeper and this might take some time.
     writeLock = d.obtainLock(WRITE_LOCK_NAME);
+    if (config.getIndexSort() != null && leafSorter != null) {
+      throw new IllegalArgumentException(
+          "[IndexWriter] can't use index sort and leaf sorter at the same time!");

Review comment:
       @msokolov Thank you for  the clarification.  Indeed, it is much clear with an example you provided. Looks like we need to think  and discuss more about merging scenario, may be in the next PR or Jira ticket. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on a change in pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256#discussion_r568095174



##########
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##########
@@ -941,6 +969,11 @@ public IndexWriter(Directory d, IndexWriterConfig conf) throws IOException {
     // obtain the write.lock. If the user configured a timeout,
     // we wrap with a sleeper and this might take some time.
     writeLock = d.obtainLock(WRITE_LOCK_NAME);
+    if (config.getIndexSort() != null && leafSorter != null) {
+      throw new IllegalArgumentException(
+          "[IndexWriter] can't use index sort and leaf sorter at the same time!");

Review comment:
       @msokolov Thank you for your feedback and explanation. Sorry, I am still not super clear about this point. It seems to me as  `indexSorter` maps each leaf's documents into the merged segment according to its sort,  the same way `leafSorter` will map each leaf's documents into the merged segment according to its sort (given several merging segments, in the merged segment the first docs should be docs from a segment with highest sort values..)
   
   But I am ok to remove this check in this PR, as this PR is not concerned with merging, and follow up on this point in the following PR. 
   
   Addressed in 7ddff6775c8




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on a change in pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256#discussion_r598857854



##########
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriterConfig.java
##########
@@ -478,6 +479,18 @@ public IndexWriterConfig setIndexSort(Sort sort) {
     return this;
   }
 
+  /**
+   * Set the comparator for sorting leaf readers. A DirectoryReader opened from a IndexWriter with
+   * this configuration will have its leaf readers sorted with the provided leaf sorter.
+   *
+   * @param leafSorter – a comparator for sorting leaf readers
+   * @return IndexWriterConfig with leafSorter set.
+   */
+  public IndexWriterConfig setLeafSorter(Comparator<LeafReader> leafSorter) {

Review comment:
       How will leafSorter sort leaves in this case? By what criteria?  If a sorting criteria is anything different than the existing order of leaves,  this will break some of our tests that rely on specific docIds. For example tests in the class `TestBasics` that check for specific docIDs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] mayya-sharipova commented on pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256#issuecomment-804182320


   I've moved this PR to a [new PR](https://github.com/apache/lucene/pull/32) in  the new Lucene github repo.  Closing this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] msokolov commented on a change in pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

Posted by GitBox <gi...@apache.org>.
msokolov commented on a change in pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256#discussion_r568113101



##########
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##########
@@ -941,6 +969,11 @@ public IndexWriter(Directory d, IndexWriterConfig conf) throws IOException {
     // obtain the write.lock. If the user configured a timeout,
     // we wrap with a sleeper and this might take some time.
     writeLock = d.obtainLock(WRITE_LOCK_NAME);
+    if (config.getIndexSort() != null && leafSorter != null) {
+      throw new IllegalArgumentException(
+          "[IndexWriter] can't use index sort and leaf sorter at the same time!");

Review comment:
       Hmm I do see where you said we want to use `leafSorter` to sort documents. I guess I might challenge that design in favor of building on the `indexSort` we already have? But perhaps this is better sorted out in the context of a later PR, as you suggested.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on a change in pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256#discussion_r567994353



##########
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##########
@@ -941,6 +969,11 @@ public IndexWriter(Directory d, IndexWriterConfig conf) throws IOException {
     // obtain the write.lock. If the user configured a timeout,
     // we wrap with a sleeper and this might take some time.
     writeLock = d.obtainLock(WRITE_LOCK_NAME);
+    if (config.getIndexSort() != null && leafSorter != null) {
+      throw new IllegalArgumentException(
+          "[IndexWriter] can't use index sort and leaf sorter at the same time!");

Review comment:
       @mikemccand Thank you for your review! From the discussion on the Jira ticket, we also wanted to use writer's `leafSorter` during merging for arranging docs in  a merged segment (by putting first docs from a segment with highest sort values according to `leafSorter`).
   
   This will be in conflict with `indexSorter`, as if provided [it will arrange merged docs according to its sort](https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/index/MergeState.java#L211).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] mayya-sharipova closed pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

Posted by GitBox <gi...@apache.org>.
mayya-sharipova closed pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on a change in pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256#discussion_r568095366



##########
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##########
@@ -933,6 +936,31 @@ protected final void ensureOpen() throws AlreadyClosedException {
    *     low-level IO error
    */
   public IndexWriter(Directory d, IndexWriterConfig conf) throws IOException {
+    this(d, conf, null);
+  }
+
+  /**
+   * Constructs a new IndexWriter per the settings given in <code>conf</code>. If you want to make
+   * "live" changes to this writer instance, use {@link #getConfig()}.
+   *
+   * <p><b>NOTE:</b> after ths writer is created, the given configuration instance cannot be passed
+   * to another writer.
+   *
+   * @param d the index directory. The index is either created or appended according <code>
+   *     conf.getOpenMode()</code>.
+   * @param conf the configuration settings according to which IndexWriter should be initialized.
+   * @param leafSorter a comparator for sorting leaf readers. Providing leafSorter is useful for
+   *     indices on which it is expected to run many queries with particular sort criteria (e.g. for
+   *     time-based indices this is usually a descending sort on timestamp). In this case {@code
+   *     leafSorter} should sort leaves according to this sort criteria. Providing leafSorter allows
+   *     to speed up this particular type of sort queries by early terminating while iterating
+   *     though segments and segments' documents.
+   * @throws IOException if the directory cannot be read/written to, or if it does not exist and
+   *     <code>conf.getOpenMode()</code> is <code>OpenMode.APPEND</code> or if there is any other
+   *     low-level IO error
+   */
+  public IndexWriter(Directory d, IndexWriterConfig conf, Comparator<LeafReader> leafSorter)

Review comment:
       Addressed in 7ddff67




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on a change in pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256#discussion_r570458862



##########
File path: lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java
##########
@@ -143,12 +158,17 @@ static StandardDirectoryReader open(
 
       writer.incRefDeleter(segmentInfos);
 
+      if (writer.getConfig().getLeafSorter() != null) {
+        readers.sort(writer.getConfig().getLeafSorter());

Review comment:
       @mikemccand  Thank you for your feedback.
   The intent was to keep `leafSorter` constant and non-updatable, as  `indexWriter.getConfig` returns `LiveIndexWriterConfig` that doesn't allow setting of a new `leafSorter`. I understand that `leafSorter` in config can still be changed, and your comment is very valid. 
   
   But In 8155f2e2abd4ed3a6280aae940d018dfe3b3dad1  I've moved sorting of leaves to the constructor, so hopefully this should resolve this concern. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on a change in pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256#discussion_r570461735



##########
File path: lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java
##########
@@ -169,7 +189,10 @@ static StandardDirectoryReader open(
    * @lucene.internal
    */
   public static DirectoryReader open(
-      Directory directory, SegmentInfos infos, List<? extends LeafReader> oldReaders)
+      Directory directory,

Review comment:
       It looks like  `IndexWriter.getReader(...)` path is covered by a previous `open` method on line 123.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] msokolov commented on a change in pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

Posted by GitBox <gi...@apache.org>.
msokolov commented on a change in pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256#discussion_r568058172



##########
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##########
@@ -941,6 +969,11 @@ public IndexWriter(Directory d, IndexWriterConfig conf) throws IOException {
     // obtain the write.lock. If the user configured a timeout,
     // we wrap with a sleeper and this might take some time.
     writeLock = d.obtainLock(WRITE_LOCK_NAME);
+    if (config.getIndexSort() != null && leafSorter != null) {
+      throw new IllegalArgumentException(
+          "[IndexWriter] can't use index sort and leaf sorter at the same time!");

Review comment:
       I think it's OK - that MultiSorter is sorting (by index sort) documents in several segments being merged; it will control the order of the documents within the new segment, but shouldn't have any influence on or conflict with the order of the segments being merged




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

Posted by GitBox <gi...@apache.org>.
mikemccand commented on a change in pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256#discussion_r566993804



##########
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##########
@@ -941,6 +969,11 @@ public IndexWriter(Directory d, IndexWriterConfig conf) throws IOException {
     // obtain the write.lock. If the user configured a timeout,
     // we wrap with a sleeper and this might take some time.
     writeLock = d.obtainLock(WRITE_LOCK_NAME);
+    if (config.getIndexSort() != null && leafSorter != null) {
+      throw new IllegalArgumentException(
+          "[IndexWriter] can't use index sort and leaf sorter at the same time!");

Review comment:
       Hmm, why is that?  I thought this change sorts whole segments, and index sorting sorts documents within one segment?  Why do they conflict?

##########
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##########
@@ -933,6 +936,31 @@ protected final void ensureOpen() throws AlreadyClosedException {
    *     low-level IO error
    */
   public IndexWriter(Directory d, IndexWriterConfig conf) throws IOException {
+    this(d, conf, null);
+  }
+
+  /**
+   * Constructs a new IndexWriter per the settings given in <code>conf</code>. If you want to make
+   * "live" changes to this writer instance, use {@link #getConfig()}.
+   *
+   * <p><b>NOTE:</b> after ths writer is created, the given configuration instance cannot be passed
+   * to another writer.
+   *
+   * @param d the index directory. The index is either created or appended according <code>
+   *     conf.getOpenMode()</code>.
+   * @param conf the configuration settings according to which IndexWriter should be initialized.
+   * @param leafSorter a comparator for sorting leaf readers. Providing leafSorter is useful for
+   *     indices on which it is expected to run many queries with particular sort criteria (e.g. for
+   *     time-based indices this is usually a descending sort on timestamp). In this case {@code
+   *     leafSorter} should sort leaves according to this sort criteria. Providing leafSorter allows
+   *     to speed up this particular type of sort queries by early terminating while iterating
+   *     though segments and segments' documents.
+   * @throws IOException if the directory cannot be read/written to, or if it does not exist and
+   *     <code>conf.getOpenMode()</code> is <code>OpenMode.APPEND</code> or if there is any other
+   *     low-level IO error
+   */
+  public IndexWriter(Directory d, IndexWriterConfig conf, Comparator<LeafReader> leafSorter)

Review comment:
       Should `leafSorter` instead be added to `IndexWriterConfig`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

Posted by GitBox <gi...@apache.org>.
muse-dev[bot] commented on a change in pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256#discussion_r565714215



##########
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##########
@@ -933,6 +936,31 @@ protected final void ensureOpen() throws AlreadyClosedException {
    *     low-level IO error
    */
   public IndexWriter(Directory d, IndexWriterConfig conf) throws IOException {
+    this(d, conf, null);

Review comment:
       *THREAD_SAFETY_VIOLATION:*  Unprotected write. Non-private method `IndexWriter(...)` indirectly writes to field `this.mergeScheduler.infoStream` outside of synchronization.
    Reporting because another access to the same memory occurs on a background thread, although this access may not.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on a change in pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256#discussion_r568520864



##########
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##########
@@ -941,6 +969,11 @@ public IndexWriter(Directory d, IndexWriterConfig conf) throws IOException {
     // obtain the write.lock. If the user configured a timeout,
     // we wrap with a sleeper and this might take some time.
     writeLock = d.obtainLock(WRITE_LOCK_NAME);
+    if (config.getIndexSort() != null && leafSorter != null) {
+      throw new IllegalArgumentException(
+          "[IndexWriter] can't use index sort and leaf sorter at the same time!");

Review comment:
       @msokolov Thank you for  the clarification.  Indeed, it is much clear with an example you provided. Looks like we need to think  and discuss more about merging scenario, may be in the next PR or Jira ticket. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on a change in pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256#discussion_r598855055



##########
File path: lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java
##########
@@ -56,7 +57,24 @@
    * @throws IOException if there is a low-level IO error
    */
   public static DirectoryReader open(final Directory directory) throws IOException {
-    return StandardDirectoryReader.open(directory, null);
+    return StandardDirectoryReader.open(directory, null, null);
+  }
+
+  /**
+   * Returns a IndexReader for the the index in the given Directory
+   *
+   * @param directory the index directory
+   * @param leafSorter a comparator for sorting leaf readers. Providing leafSorter is useful for
+   *     indices on which it is expected to run many queries with particular sort criteria (e.g. for
+   *     time-based indices this is usually a descending sort on timestamp). In this case {@code
+   *     leafSorter} should sort leaves according to this sort criteria. Providing leafSorter allows
+   *     to speed up this particular type of sort queries by early terminating while iterating
+   *     though segments and segments' documents.

Review comment:
       Addressed in https://github.com/apache/lucene/pull/32/commits/f3fbba10d5ff391bc66bbd04032a873c91680493




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on a change in pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256#discussion_r570454676



##########
File path: lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java
##########
@@ -39,33 +40,44 @@
 
   final IndexWriter writer;
   final SegmentInfos segmentInfos;
+  private final Comparator<LeafReader> leafSorter;
   private final boolean applyAllDeletes;
   private final boolean writeAllDeletes;
 
-  /** called only from static open() methods */
+  /**
+   * package private constructor, called only from static open() methods. If parameter {@code
+   * leafSorter} is not {@code null}, the provided {@code readers} are expected to be already sorted

Review comment:
       Great suggestion! I moved sorting of leaves to the constructor. 
   Addressed in 8155f2e2abd4ed3a6280aae940d018dfe3b3dad1.

##########
File path: lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java
##########
@@ -143,12 +158,17 @@ static StandardDirectoryReader open(
 
       writer.incRefDeleter(segmentInfos);
 
+      if (writer.getConfig().getLeafSorter() != null) {
+        readers.sort(writer.getConfig().getLeafSorter());

Review comment:
       @mikemccand  Thank you for your feedback.
   The intent was to keep `leafSorter` constant and non-updatable, as  `indexWriter.getConfig` returns `LiveIndexWriterConfig` that doesn't allow setting of a new `leafSorter`. I understand that `leafSorter` in config can still be changed, and your comment is very valid. 
   
   But In 8155f2e2abd4ed3a6280aae940d018dfe3b3dad1  I've moved sorting of leaves to the constructor, so hopefully this should resolve this concern. 

##########
File path: lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java
##########
@@ -169,7 +189,10 @@ static StandardDirectoryReader open(
    * @lucene.internal
    */
   public static DirectoryReader open(
-      Directory directory, SegmentInfos infos, List<? extends LeafReader> oldReaders)
+      Directory directory,

Review comment:
       It looks like  `IndexWriter.getReader(...)` path is covered by a previous `open` method on line 123.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on a change in pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256#discussion_r598854515



##########
File path: lucene/core/src/test/org/apache/lucene/index/TestIndexWriterReader.java
##########
@@ -169,7 +176,7 @@ public void testUpdateDocument() throws Exception {
     // writer.close wrote a new commit
     assertFalse(r2.isCurrent());
 
-    DirectoryReader r3 = DirectoryReader.open(dir1);
+    DirectoryReader r3 = open(dir1);

Review comment:
       Addressed in https://github.com/apache/lucene/pull/32/commits/f3fbba10d5ff391bc66bbd04032a873c91680493




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

Posted by GitBox <gi...@apache.org>.
mikemccand commented on a change in pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256#discussion_r569643083



##########
File path: lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java
##########
@@ -143,12 +158,17 @@ static StandardDirectoryReader open(
 
       writer.incRefDeleter(segmentInfos);
 
+      if (writer.getConfig().getLeafSorter() != null) {
+        readers.sort(writer.getConfig().getLeafSorter());

Review comment:
       Is IWC.leafSorter allowed to be live-updated?  If so, maybe we should pull out local variable here to hold onto the `leafSorter` and use that variable here and below, to ensure it is the same?

##########
File path: lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java
##########
@@ -169,7 +189,10 @@ static StandardDirectoryReader open(
    * @lucene.internal
    */
   public static DirectoryReader open(
-      Directory directory, SegmentInfos infos, List<? extends LeafReader> oldReaders)
+      Directory directory,

Review comment:
       Are we handling the near-real-time reader (`IndexWriter.getReader(...)`) path here too?

##########
File path: lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java
##########
@@ -39,33 +40,44 @@
 
   final IndexWriter writer;
   final SegmentInfos segmentInfos;
+  private final Comparator<LeafReader> leafSorter;
   private final boolean applyAllDeletes;
   private final boolean writeAllDeletes;
 
-  /** called only from static open() methods */
+  /**
+   * package private constructor, called only from static open() methods. If parameter {@code
+   * leafSorter} is not {@code null}, the provided {@code readers} are expected to be already sorted

Review comment:
       Could we add an `assertLeavesSorted` method to confirm they really were properly sorted by the caller?  There are so many paths for creating `StandardDirectoryReader` through are code base that it would not surprise me if one of the misses to sort properly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on a change in pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256#discussion_r570454676



##########
File path: lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java
##########
@@ -39,33 +40,44 @@
 
   final IndexWriter writer;
   final SegmentInfos segmentInfos;
+  private final Comparator<LeafReader> leafSorter;
   private final boolean applyAllDeletes;
   private final boolean writeAllDeletes;
 
-  /** called only from static open() methods */
+  /**
+   * package private constructor, called only from static open() methods. If parameter {@code
+   * leafSorter} is not {@code null}, the provided {@code readers} are expected to be already sorted

Review comment:
       Great suggestion! I moved sorting of leaves to the constructor. 
   Addressed in 8155f2e2abd4ed3a6280aae940d018dfe3b3dad1.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] jimczi commented on a change in pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

Posted by GitBox <gi...@apache.org>.
jimczi commented on a change in pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256#discussion_r572984156



##########
File path: lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java
##########
@@ -39,33 +40,47 @@
 
   final IndexWriter writer;
   final SegmentInfos segmentInfos;
+  private final Comparator<LeafReader> leafSorter;
   private final boolean applyAllDeletes;
   private final boolean writeAllDeletes;
 
-  /** called only from static open() methods */
+  /** package private constructor, called only from static open() methods. */
   StandardDirectoryReader(
       Directory directory,
       LeafReader[] readers,
       IndexWriter writer,
       SegmentInfos sis,
+      Comparator<LeafReader> leafSorter,
       boolean applyAllDeletes,
       boolean writeAllDeletes)
       throws IOException {
-    super(directory, readers);
+    super(directory, sortLeaves(readers, leafSorter));

Review comment:
       I wonder if the leafSorter should be declared and executed in the base class `BaseCompositeReader` ?
   That would expose the feature explicitly to `MultiReader` and friends.

##########
File path: lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java
##########
@@ -56,7 +57,24 @@
    * @throws IOException if there is a low-level IO error
    */
   public static DirectoryReader open(final Directory directory) throws IOException {
-    return StandardDirectoryReader.open(directory, null);
+    return StandardDirectoryReader.open(directory, null, null);
+  }
+
+  /**
+   * Returns a IndexReader for the the index in the given Directory
+   *
+   * @param directory the index directory
+   * @param leafSorter a comparator for sorting leaf readers. Providing leafSorter is useful for
+   *     indices on which it is expected to run many queries with particular sort criteria (e.g. for
+   *     time-based indices this is usually a descending sort on timestamp). In this case {@code
+   *     leafSorter} should sort leaves according to this sort criteria. Providing leafSorter allows
+   *     to speed up this particular type of sort queries by early terminating while iterating
+   *     though segments and segments' documents.

Review comment:
       nit: s/though/through/

##########
File path: lucene/core/src/test/org/apache/lucene/index/TestIndexWriterReader.java
##########
@@ -169,7 +176,7 @@ public void testUpdateDocument() throws Exception {
     // writer.close wrote a new commit
     assertFalse(r2.isCurrent());
 
-    DirectoryReader r3 = DirectoryReader.open(dir1);
+    DirectoryReader r3 = open(dir1);

Review comment:
       Can you keep the explicit version ?

##########
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriterConfig.java
##########
@@ -478,6 +479,18 @@ public IndexWriterConfig setIndexSort(Sort sort) {
     return this;
   }
 
+  /**
+   * Set the comparator for sorting leaf readers. A DirectoryReader opened from a IndexWriter with
+   * this configuration will have its leaf readers sorted with the provided leaf sorter.
+   *
+   * @param leafSorter – a comparator for sorting leaf readers
+   * @return IndexWriterConfig with leafSorter set.
+   */
+  public IndexWriterConfig setLeafSorter(Comparator<LeafReader> leafSorter) {

Review comment:
       You added a specific unit test for this feature but we could also set a random value in `LuceneTestCase#newIndexWriterConfig` to improve the coverage.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] msokolov commented on a change in pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

Posted by GitBox <gi...@apache.org>.
msokolov commented on a change in pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256#discussion_r568110673



##########
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##########
@@ -941,6 +969,11 @@ public IndexWriter(Directory d, IndexWriterConfig conf) throws IOException {
     // obtain the write.lock. If the user configured a timeout,
     // we wrap with a sleeper and this might take some time.
     writeLock = d.obtainLock(WRITE_LOCK_NAME);
+    if (config.getIndexSort() != null && leafSorter != null) {
+      throw new IllegalArgumentException(
+          "[IndexWriter] can't use index sort and leaf sorter at the same time!");

Review comment:
       OK, maybe I misunderstood the intent. Perhaps an example would clarify. Say that we have three segments, A, B , C containing documents `A={0, 3, 6}; B={1, 4, 7}; C={2, 5, 8}`, where the documents are understood to have a single field with the value shown, and the index sort is ordered in the natural way.  Without this change, if we merged A and B, we'd get a new segment `A+B={0, 1, 3, 4, 6, 7}`. Now suppose there is no index sort (and the documents just "happen" to be in the index in the order given above, for the sake of the example), and we apply a `leafSorter` that sorts by the minimum value of any document in the segment (I guess it could be any sort of aggregate over the segment?), then we would get `A+B={0, 3, 6, 1, 4, 7}`. Now if we apply both sorts, we would get the same result as in the first case, right? I'm still unclear how the conflict arises.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on a change in pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256#discussion_r567994551



##########
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##########
@@ -933,6 +936,31 @@ protected final void ensureOpen() throws AlreadyClosedException {
    *     low-level IO error
    */
   public IndexWriter(Directory d, IndexWriterConfig conf) throws IOException {
+    this(d, conf, null);
+  }
+
+  /**
+   * Constructs a new IndexWriter per the settings given in <code>conf</code>. If you want to make
+   * "live" changes to this writer instance, use {@link #getConfig()}.
+   *
+   * <p><b>NOTE:</b> after ths writer is created, the given configuration instance cannot be passed
+   * to another writer.
+   *
+   * @param d the index directory. The index is either created or appended according <code>
+   *     conf.getOpenMode()</code>.
+   * @param conf the configuration settings according to which IndexWriter should be initialized.
+   * @param leafSorter a comparator for sorting leaf readers. Providing leafSorter is useful for
+   *     indices on which it is expected to run many queries with particular sort criteria (e.g. for
+   *     time-based indices this is usually a descending sort on timestamp). In this case {@code
+   *     leafSorter} should sort leaves according to this sort criteria. Providing leafSorter allows
+   *     to speed up this particular type of sort queries by early terminating while iterating
+   *     though segments and segments' documents.
+   * @throws IOException if the directory cannot be read/written to, or if it does not exist and
+   *     <code>conf.getOpenMode()</code> is <code>OpenMode.APPEND</code> or if there is any other
+   *     low-level IO error
+   */
+  public IndexWriter(Directory d, IndexWriterConfig conf, Comparator<LeafReader> leafSorter)

Review comment:
       Thank you for your suggestion, I will explore this.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org