You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2020/06/26 14:05:02 UTC

[GitHub] [lucene-solr] s1monw opened a new pull request #1617: LUCENE-8962: Merge small segments on commit

s1monw opened a new pull request #1617:
URL: https://github.com/apache/lucene-solr/pull/1617


   This PR revisits the merge-on-commit patch submitted by @msfroh a little while ago. The only change from that earlier PR is a fix for failures uncovered by TestIndexWriter.testRandomOperations, some whitespace cleanups, and a rebase on the current master branch. The problem was that updateSegmentInfosOnMergeFinish would incorrectly decRef a merged segments' files if that segment was modified by deletions (or updates) while it was being merged.
   
   With this fix, I ran the failing test case several thousands of times with no failures, whereas before it would routinely fail after a few hundred test runs.
   
   Co-authored-by: Michael Froh <ms...@apache.org>
   Co-authored-by: Michael Sokolov <so...@falutin.net>
   
   This is the third try after the patch itself and #1552


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] msfroh commented on pull request #1617: LUCENE-8962: Merge small segments on commit

Posted by GitBox <gi...@apache.org>.
msfroh commented on pull request #1617:
URL: https://github.com/apache/lucene-solr/pull/1617#issuecomment-650292411


   Wow! I didn't realize that capturing/cloning live docs at a point in time would be so complicated. I learned a lot from this PR about how deletes are handled. Thanks, Simon!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] s1monw commented on pull request #1617: LUCENE-8962: Merge small segments on commit

Posted by GitBox <gi...@apache.org>.
s1monw commented on pull request #1617:
URL: https://github.com/apache/lucene-solr/pull/1617#issuecomment-650619003


   merged into master. Thanks everybody involved!!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] s1monw commented on a change in pull request #1617: LUCENE-8962: Merge small segments on commit

Posted by GitBox <gi...@apache.org>.
s1monw commented on a change in pull request #1617:
URL: https://github.com/apache/lucene-solr/pull/1617#discussion_r446381908



##########
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriterConfig.java
##########
@@ -459,6 +463,21 @@ public IndexWriterConfig setCommitOnClose(boolean commitOnClose) {
     return this;
   }
 
+  /**
+   * Expert: sets the amount of time to wait for merges (during {@link IndexWriter#commit}) returned by
+   * MergePolicy.findFullFlushMerges(...).
+   * If this time is reached, we proceed with the commit based on segments merged up to that point.
+   * The merges are not cancelled, and will still run to completion independent of the commit,
+   * like natural segment merges. The default is <code>{@value IndexWriterConfig#DEFAULT_MAX_COMMIT_MERGE_WAIT_MILLIS}</code>.
+   *
+   * Note: This settings has no effect unless {@link MergePolicy#findFullFlushMerges(MergeTrigger, SegmentInfos, MergePolicy.MergeContext)}
+   * has an implementation that actually returns merges which by default doesn't return any merges.
+   */
+  public IndexWriterConfig setMaxCommitMergeWaitMillis(long maxCommitMergeWaitMillis) {

Review comment:
       I am torn if we'd use a double or long here. I think we used double in other places but I do like a discrete value better...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] s1monw commented on a change in pull request #1617: LUCENE-8962: Merge small segments on commit

Posted by GitBox <gi...@apache.org>.
s1monw commented on a change in pull request #1617:
URL: https://github.com/apache/lucene-solr/pull/1617#discussion_r446379945



##########
File path: lucene/core/src/java/org/apache/lucene/index/MergePolicy.java
##########
@@ -399,6 +416,40 @@ boolean hasFinished() {
     Optional<Boolean> hasCompletedSuccessfully() {
       return Optional.ofNullable(mergeCompleted.getNow(null));
     }
+
+
+    /**
+     * Called before the merge is committed
+     */
+    void onMergeCommit() {
+    }
+
+    /**
+     * Sets the merge readers for this merge.
+     */
+    void initMergeReaders(IOUtils.IOFunction<SegmentCommitInfo, MergeReader> readerFactory) throws IOException {
+      assert mergeReaders.isEmpty() : "merge readers must be empty";
+      assert mergeCompleted.isDone() == false : "merge is already done";
+      ArrayList<MergeReader> readers = new ArrayList<>(segments.size());
+      try {
+        for (final SegmentCommitInfo info : segments) {
+          // Hold onto the "live" reader; we will use this to
+          // commit merged deletes
+          readers.add(readerFactory.apply(info));
+        }
+      } finally {
+        // ensure we assign this to close them in the case of an exception
+        this.mergeReaders = List.copyOf(readers);

Review comment:
       I left a comment. I use immutable lists everywhere here it would be fatal if we modify this list outside of OneMerge. I think that justifies the copy.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] s1monw commented on pull request #1617: LUCENE-8962: Merge small segments on commit

Posted by GitBox <gi...@apache.org>.
s1monw commented on pull request #1617:
URL: https://github.com/apache/lucene-solr/pull/1617#issuecomment-650380004


   > Wow! I didn't realize that capturing/cloning live docs at a point in time would be so complicated. I learned a lot from this PR about how deletes are handled. Thanks, Simon!
   
   @msfroh I didn't either :) it's been a while that I digged into merging code and I think the same is true for @mikemccand - I am impressed how far you got in this pretty complex class. I'd love to see you digging more and coming up with more stuff like this. Thank you for contributing this. I am always happy to help. This entire process from the first patch to the feature being integrated is a great example of how amazing OSS is. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] s1monw commented on pull request #1617: LUCENE-8962: Merge small segments on commit

Posted by GitBox <gi...@apache.org>.
s1monw commented on pull request #1617:
URL: https://github.com/apache/lucene-solr/pull/1617#issuecomment-650198076


   @mikemccand can you take a look


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] s1monw commented on a change in pull request #1617: LUCENE-8962: Merge small segments on commit

Posted by GitBox <gi...@apache.org>.
s1monw commented on a change in pull request #1617:
URL: https://github.com/apache/lucene-solr/pull/1617#discussion_r446380003



##########
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriterConfig.java
##########
@@ -459,6 +463,27 @@ public IndexWriterConfig setCommitOnClose(boolean commitOnClose) {
     return this;
   }
 
+  /**
+   * Expert: sets the amount of time to wait for merges returned by MergePolicy.findFullFlushMerges(...).
+   * If this time is reached, we proceed with the commit based on segments merged up to that point.
+   * The merges are not cancelled, and will still run to completion independent of the commit
+   * like normal segment merges. The default is <code>{@value IndexWriterConfig#DEFAULT_MAX_COMMIT_MERGE_WAIT_MILLIS}</code>.
+   *
+   * Note: This settings has no effect unless {@link MergePolicy#findFullFlushMerges(MergeTrigger, SegmentInfos, MergePolicy.MergeContext)}
+   * has an implementation that actually returns merges which by default doesn't return any merges.
+   */
+  public IndexWriterConfig setMaxCommitMergeWaitMillis(long maxCommitMergeWaitMillis) {
+    this.maxCommitMergeWaitMillis = maxCommitMergeWaitMillis;
+    return this;
+  }
+
+  /** We only allow sorting on these types */
+  private static final EnumSet<SortField.Type> ALLOWED_INDEX_SORT_TYPES = EnumSet.of(SortField.Type.STRING,

Review comment:
       👍 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] s1monw merged pull request #1617: LUCENE-8962: Merge small segments on commit

Posted by GitBox <gi...@apache.org>.
s1monw merged pull request #1617:
URL: https://github.com/apache/lucene-solr/pull/1617


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1617: LUCENE-8962: Merge small segments on commit

Posted by GitBox <gi...@apache.org>.
mikemccand commented on a change in pull request #1617:
URL: https://github.com/apache/lucene-solr/pull/1617#discussion_r446349990



##########
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriterConfig.java
##########
@@ -459,6 +463,27 @@ public IndexWriterConfig setCommitOnClose(boolean commitOnClose) {
     return this;
   }
 
+  /**
+   * Expert: sets the amount of time to wait for merges returned by MergePolicy.findFullFlushMerges(...).
+   * If this time is reached, we proceed with the commit based on segments merged up to that point.
+   * The merges are not cancelled, and will still run to completion independent of the commit
+   * like normal segment merges. The default is <code>{@value IndexWriterConfig#DEFAULT_MAX_COMMIT_MERGE_WAIT_MILLIS}</code>.
+   *
+   * Note: This settings has no effect unless {@link MergePolicy#findFullFlushMerges(MergeTrigger, SegmentInfos, MergePolicy.MergeContext)}
+   * has an implementation that actually returns merges which by default doesn't return any merges.
+   */
+  public IndexWriterConfig setMaxCommitMergeWaitMillis(long maxCommitMergeWaitMillis) {
+    this.maxCommitMergeWaitMillis = maxCommitMergeWaitMillis;
+    return this;
+  }
+
+  /** We only allow sorting on these types */
+  private static final EnumSet<SortField.Type> ALLOWED_INDEX_SORT_TYPES = EnumSet.of(SortField.Type.STRING,

Review comment:
       OK I see this was removed earlier in mainline for LUCENE-9330 -- it must've snuck back in as a merge conflict.
   
   I'll re-remove now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1617: LUCENE-8962: Merge small segments on commit

Posted by GitBox <gi...@apache.org>.
mikemccand commented on a change in pull request #1617:
URL: https://github.com/apache/lucene-solr/pull/1617#discussion_r446269728



##########
File path: lucene/core/src/java/org/apache/lucene/index/MergePolicy.java
##########
@@ -399,6 +416,40 @@ boolean hasFinished() {
     Optional<Boolean> hasCompletedSuccessfully() {
       return Optional.ofNullable(mergeCompleted.getNow(null));
     }
+
+
+    /**
+     * Called before the merge is committed
+     */
+    void onMergeCommit() {
+    }
+
+    /**
+     * Sets the merge readers for this merge.
+     */
+    void initMergeReaders(IOUtils.IOFunction<SegmentCommitInfo, MergeReader> readerFactory) throws IOException {
+      assert mergeReaders.isEmpty() : "merge readers must be empty";
+      assert mergeCompleted.isDone() == false : "merge is already done";
+      ArrayList<MergeReader> readers = new ArrayList<>(segments.size());
+      try {
+        for (final SegmentCommitInfo info : segments) {
+          // Hold onto the "live" reader; we will use this to
+          // commit merged deletes
+          readers.add(readerFactory.apply(info));
+        }
+      } finally {
+        // ensure we assign this to close them in the case of an exception
+        this.mergeReaders = List.copyOf(readers);

Review comment:
       Hmm is there some (sneaky) reason why we couldn't just do `this.mergeReaders = new ArrayList<>(segments.size());` above and then append to that list?  Instead of appending to private `ArrayList` and then making a copy in the end?

##########
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriterConfig.java
##########
@@ -459,6 +463,27 @@ public IndexWriterConfig setCommitOnClose(boolean commitOnClose) {
     return this;
   }
 
+  /**
+   * Expert: sets the amount of time to wait for merges returned by MergePolicy.findFullFlushMerges(...).
+   * If this time is reached, we proceed with the commit based on segments merged up to that point.
+   * The merges are not cancelled, and will still run to completion independent of the commit
+   * like normal segment merges. The default is <code>{@value IndexWriterConfig#DEFAULT_MAX_COMMIT_MERGE_WAIT_MILLIS}</code>.
+   *
+   * Note: This settings has no effect unless {@link MergePolicy#findFullFlushMerges(MergeTrigger, SegmentInfos, MergePolicy.MergeContext)}
+   * has an implementation that actually returns merges which by default doesn't return any merges.
+   */
+  public IndexWriterConfig setMaxCommitMergeWaitMillis(long maxCommitMergeWaitMillis) {
+    this.maxCommitMergeWaitMillis = maxCommitMergeWaitMillis;
+    return this;
+  }
+
+  /** We only allow sorting on these types */
+  private static final EnumSet<SortField.Type> ALLOWED_INDEX_SORT_TYPES = EnumSet.of(SortField.Type.STRING,

Review comment:
       Hmm how did this sneak in?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org