You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/07/15 01:42:05 UTC

[GitHub] [flink-table-store] SteNicholas opened a new pull request, #216: [FLINK-28482] num-sorted-run.stop-trigger introduced a unstable merging

SteNicholas opened a new pull request, #216:
URL: https://github.com/apache/flink-table-store/pull/216

   `num-sorted-run.stop-trigger` is introduced in `CoreOption` to configure the number of sorted runs that trigger the stopping of writes, of which default value is 10. The default value of `num-sorted-run.stop-trigger` means that the maximum number of runs generated is 10, and 10 runs may be merged at the same time during compaction or read. Reading 10 ORC files at the same time may lead to `OutOfMemoryError`.
   
   **The brief change log**
   - Introduces `compaction.max-sorted-run-num` option in `CoreOption` to define the maximum sorted run number to trigger the stopping of write.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] SteNicholas commented on pull request #216: [FLINK-28482] num-sorted-run.stop-trigger introduced a unstable merging

Posted by GitBox <gi...@apache.org>.
SteNicholas commented on PR #216:
URL: https://github.com/apache/flink-table-store/pull/216#issuecomment-1185388794

   @JingsongLi, thanks for you explanation of this ticket. I have addressed above suggestion to modify the pick logic of compaction. PTAL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #216: [FLINK-28482] num-sorted-run.stop-trigger introduced a unstable merging

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on code in PR #216:
URL: https://github.com/apache/flink-table-store/pull/216#discussion_r925314994


##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/mergetree/compact/UniversalCompaction.java:
##########
@@ -144,15 +147,30 @@ private long candidateSize(List<LevelSortedRun> runs, int candidateCount) {
         return size;
     }
 
+    static CompactUnit createUnit(
+            List<LevelSortedRun> runs, int maxLevel, int runCount, Integer maxSortedRunNum) {
+        boolean withinMaxRun = maxSortedRunNum == null || maxSortedRunNum >= runCount;

Review Comment:
   Just:
   ```
   if (runCount > maxSortedRunNum) {
      runCount = maxSortedRunNum;
   }
   ```



##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/CoreOptions.java:
##########
@@ -244,6 +244,15 @@ public class CoreOptions implements Serializable {
                                     + "for append-only table, even if sum(size(f_i)) < targetFileSize. This value "
                                     + "avoids pending too much small files, which slows down the performance.");
 
+    public static final ConfigOption<Integer> COMPACTION_MAX_SORTED_RUN_NUM =
+            ConfigOptions.key("compaction.max-sorted-run-num")
+                    .intType()
+                    .noDefaultValue()

Review Comment:
   Can its default value is Integer.MAX?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #216: [FLINK-28482] num-sorted-run.stop-trigger introduced a unstable merging

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on code in PR #216:
URL: https://github.com/apache/flink-table-store/pull/216#discussion_r922940630


##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/mergetree/compact/MergeTreeCompactManager.java:
##########
@@ -69,7 +75,17 @@ public void submitCompaction() {
             throw new IllegalStateException(
                     "Please finish the previous compaction before submitting new one.");
         }
-        strategy.pick(levels.numberOfLevels(), levels.levelSortedRuns())
+        List<LevelSortedRun> sortedRuns = levels.levelSortedRuns();
+        if (maxSortedRunNum != null && maxSortedRunNum < sortedRuns.size()) {
+            pickSortedRuns(sortedRuns.subList(0, maxSortedRunNum));
+            pickSortedRuns(sortedRuns.subList(maxSortedRunNum, sortedRuns.size()));
+        } else {
+            pickSortedRuns(sortedRuns);
+        }
+    }
+
+    private void pickSortedRuns(List<LevelSortedRun> sortedRuns) {
+        strategy.pick(levels.numberOfLevels(), sortedRuns)
                 .ifPresent(
                         unit -> {
                             if (unit.files().size() < 2) {

Review Comment:
   I think it is better to limit sorted runs in `CompactStrategy`. We can pass `maxRuns` to `UniversalCompaction.this(...)`, and limit runs in `createUnit`.
   
   The `strategy.pick(levels.numberOfLevels(), partial runs)` may lead to incorrect runs, because strategy doesn't know the global information, it's not sure if there are existing runs in the deep layers.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi commented on pull request #216: [FLINK-28482] num-sorted-run.stop-trigger introduced a unstable merging

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on PR #216:
URL: https://github.com/apache/flink-table-store/pull/216#issuecomment-1185204146

   Thanks @SteNicholas for the contribution.
   
   The purpose of introducing this option is to control the number of runs from `strategy.pick` in the `MergeTreeCompactManager`, so only the max number runs are selected for compaction.
   
   So we need to modify `MergeTreeCompactManager`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #216: [FLINK-28482] num-sorted-run.stop-trigger introduced a unstable merging

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on code in PR #216:
URL: https://github.com/apache/flink-table-store/pull/216#discussion_r923982350


##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/CoreOptions.java:
##########
@@ -429,6 +438,10 @@ public int maxFileNum() {
         return options.get(COMPACTION_MAX_FILE_NUM);
     }
 
+    public Integer maxSortedRunNum() {
+        return options.get(COMPACTION_MAX_SORTED_RUN_NUM);
+    }
+
     public boolean enableChangelogFile() {

Review Comment:
   Maybe we should set the default value of `numLevels` to `numSortedRunStopTrigger`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi merged pull request #216: [FLINK-28482] num-sorted-run.stop-trigger introduced a unstable merging

Posted by GitBox <gi...@apache.org>.
JingsongLi merged PR #216:
URL: https://github.com/apache/flink-table-store/pull/216


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org