You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/05/31 16:28:26 UTC

[GitHub] [lucene] jpountz opened a new pull request, #936: LUCENE-10574: Avoid O(n^2) merging with LogMergePolicy

jpountz opened a new pull request, #936:
URL: https://github.com/apache/lucene/pull/936

   Originally I had tried to remove O(n^2) merging from LogMergePolicy using the 
   same approach as for TieredMergePolicy, but this did not work well as it fought
   against invariants that LogMergePolicy is trying to maintain. So this switches
   to a completely different approach that is more in line with the existing logir
   of LogMergePolicy: instead of being rounded to the floor segment size, segments
   below the floor segment size get applied a greater level span that allows more
   unbalanced merges on smaller segments but still requires them to be somewhat
   balanced.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz merged pull request #936: LUCENE-10574: Avoid O(n^2) merging with LogMergePolicy

Posted by GitBox <gi...@apache.org>.
jpountz merged PR #936:
URL: https://github.com/apache/lucene/pull/936


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a diff in pull request #936: LUCENE-10574: Avoid O(n^2) merging with LogMergePolicy

Posted by GitBox <gi...@apache.org>.
jpountz commented on code in PR #936:
URL: https://github.com/apache/lucene/pull/936#discussion_r885852033


##########
lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseMergePolicyTestCase.java:
##########
@@ -328,21 +328,31 @@ public Set<String> getPendingDeletions() throws IOException {
   protected static SegmentInfos applyMerge(
       SegmentInfos infos, OneMerge merge, String mergedSegmentName, IOStats stats)
       throws IOException {
-    LinkedHashSet<SegmentCommitInfo> scis = new LinkedHashSet<>(infos.asList());
+
     int newMaxDoc = 0;
     double newSize = 0;
     for (SegmentCommitInfo sci : merge.segments) {
       int numLiveDocs = sci.info.maxDoc() - sci.getDelCount();
       newSize += (double) sci.sizeInBytes() * numLiveDocs / sci.info.maxDoc() / 1024 / 1024;
       newMaxDoc += numLiveDocs;
-      boolean removed = scis.remove(sci);
-      assertTrue(removed);
     }
+    SegmentCommitInfo mergedInfo =
+        makeSegmentCommitInfo(mergedSegmentName, newMaxDoc, 0, newSize, IndexWriter.SOURCE_MERGE);
+
+    Set<SegmentCommitInfo> mergedAway = new HashSet<>(merge.segments);
+    boolean mergedSegmentAdded = false;
     SegmentInfos newInfos = new SegmentInfos(Version.LATEST.major);
-    newInfos.addAll(scis);
-    // Now add the merged segment
-    newInfos.add(
-        makeSegmentCommitInfo(mergedSegmentName, newMaxDoc, 0, newSize, IndexWriter.SOURCE_MERGE));
+    for (int i = 0; i < infos.size(); ++i) {
+      SegmentCommitInfo info = infos.info(i);
+      if (mergedAway.contains(info)) {
+        if (mergedSegmentAdded == false) {
+          newInfos.add(mergedInfo);
+          mergedSegmentAdded = true;
+        }
+      } else {
+        newInfos.add(info);
+      }
+    }

Review Comment:
   this change maintains order of merged segments in segment infos, akin to SegmentInfos#applyMergeChanges, which is important for LogMergePolicy



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org