You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "zhouxiz9 (via GitHub)" <gi...@apache.org> on 2023/06/26 23:21:26 UTC

[GitHub] [pinot] zhouxiz9 opened a new pull request, #10979: Skip generating rollup task with all merged segments

zhouxiz9 opened a new pull request, #10979:
URL: https://github.com/apache/pinot/pull/10979

   This PR adds a filtering step during the minion task generation process to skip generating rollup task with all merged segments.
   
   * For non rollup task, the task generation process stays the same.
   * For rollup task, if some of the input segments are not merged, the process stays the same.
   * For rollup task, if all input segments are all merged but the target merge level is higher, the process stays the same. E.g. input segments are merged at 5mins but the target merge level is 1 day then we still generate the rollup tasks.
   * For rollup task, if all input segments are all merged at the target merge level, skip generating the task.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #10979: Skip generating rollup task with all merged segments

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #10979:
URL: https://github.com/apache/pinot/pull/10979#issuecomment-1610240202

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/10979?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#10979](https://app.codecov.io/gh/apache/pinot/pull/10979?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (fa32382) into [master](https://app.codecov.io/gh/apache/pinot/commit/8170d435c5e8999daae9bb393e9d172aea2bdbd7?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (8170d43) will **decrease** coverage by `0.01%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #10979      +/-   ##
   ==========================================
   - Coverage    0.11%    0.11%   -0.01%     
   ==========================================
     Files        2189     2189              
     Lines      118056   118064       +8     
     Branches    17872    17874       +2     
   ==========================================
     Hits          137      137              
   - Misses     117899   117907       +8     
     Partials       20       20              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1temurin11 | `?` | |
   | integration1temurin17 | `?` | |
   | integration1temurin20 | `0.00% <0.00%> (?)` | |
   | integration2temurin11 | `?` | |
   | integration2temurin17 | `?` | |
   | integration2temurin20 | `?` | |
   | unittests1temurin11 | `?` | |
   | unittests1temurin17 | `?` | |
   | unittests1temurin20 | `?` | |
   | unittests2temurin11 | `0.11% <0.00%> (-0.01%)` | :arrow_down: |
   | unittests2temurin17 | `0.11% <0.00%> (?)` | |
   | unittests2temurin20 | `0.11% <0.00%> (-0.01%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://app.codecov.io/gh/apache/pinot/pull/10979?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [...on/tasks/mergerollup/MergeRollupTaskGenerator.java](https://app.codecov.io/gh/apache/pinot/pull/10979?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtcGx1Z2lucy9waW5vdC1taW5pb24tdGFza3MvcGlub3QtbWluaW9uLWJ1aWx0aW4tdGFza3Mvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3BsdWdpbi9taW5pb24vdGFza3MvbWVyZ2Vyb2xsdXAvTWVyZ2VSb2xsdXBUYXNrR2VuZXJhdG9yLmphdmE=) | `0.00% <0.00%> (ø)` | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] zhouxiz9 commented on a diff in pull request #10979: Skip generating rollup task with all merged segments when "skipAllMerged" mode is enabled

Posted by "zhouxiz9 (via GitHub)" <gi...@apache.org>.
zhouxiz9 commented on code in PR #10979:
URL: https://github.com/apache/pinot/pull/10979#discussion_r1277178898


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -663,18 +672,26 @@ private List<PinotTaskConfig> createPinotTaskConfigs(List<SegmentZKMetadata> sel
       int numRecordsPerTask = 0;
       List<List<String>> segmentNamesList = new ArrayList<>();
       List<List<String>> downloadURLsList = new ArrayList<>();
+      List<SegmentZKMetadata> segmentZKMetadataList = new ArrayList<>();
       List<String> segmentNames = new ArrayList<>();
       List<String> downloadURLs = new ArrayList<>();
 
       for (int i = 0; i < segments.size(); i++) {
         SegmentZKMetadata targetSegment = segments.get(i);
+        segmentZKMetadataList.add(targetSegment);
         segmentNames.add(targetSegment.getSegmentName());
         downloadURLs.add(targetSegment.getDownloadUrl());
         numRecordsPerTask += targetSegment.getTotalDocs();
         if (numRecordsPerTask >= maxNumRecordsPerTask || i == segments.size() - 1) {
-          segmentNamesList.add(segmentNames);
-          downloadURLsList.add(downloadURLs);
+          // If "skipAllMerged" mode is enabled, skip generating task with all merged segments
+          boolean skipAllMerged = MergeTask.SKIP_ALL_MERGED_MODE.equalsIgnoreCase(taskConfigs.get(MergeTask.MODE));

Review Comment:
   That makes sense. Since `processAll` and `skipAllMerged` are not mutually exclusive and have some functionality overlap, I feel that it is more appropriate to not make `skipAllMerged` as a mode but as a new configuration. I've updated the implementation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] zhouxiz9 commented on a diff in pull request #10979: Skip generating rollup task with all merged segments

Posted by "zhouxiz9 (via GitHub)" <gi...@apache.org>.
zhouxiz9 commented on code in PR #10979:
URL: https://github.com/apache/pinot/pull/10979#discussion_r1261828779


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -656,18 +666,25 @@ private List<PinotTaskConfig> createPinotTaskConfigs(List<SegmentZKMetadata> sel
     int numRecordsPerTask = 0;
     List<List<String>> segmentNamesList = new ArrayList<>();
     List<List<String>> downloadURLsList = new ArrayList<>();
+    List<SegmentZKMetadata> segments = new ArrayList<>();
     List<String> segmentNames = new ArrayList<>();
     List<String> downloadURLs = new ArrayList<>();
 
     for (int i = 0; i < selectedSegments.size(); i++) {
       SegmentZKMetadata targetSegment = selectedSegments.get(i);
+      segments.add(targetSegment);
       segmentNames.add(targetSegment.getSegmentName());
       downloadURLs.add(targetSegment.getDownloadUrl());
       numRecordsPerTask += targetSegment.getTotalDocs();
       if (numRecordsPerTask >= maxNumRecordsPerTask || i == selectedSegments.size() - 1) {
-        segmentNamesList.add(segmentNames);
-        downloadURLsList.add(downloadURLs);
+        // Skip generating rollup task with all merged segments
+        if (MergeTaskUtils.getMergeType(mergeConfigs) != MergeType.ROLLUP

Review Comment:
   I think it can be applied to concat an dedup as well. Let me update.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] zhouxiz9 commented on pull request #10979: Skip generating rollup task with all merged segments when "skipAllMerged" mode is enabled

Posted by "zhouxiz9 (via GitHub)" <gi...@apache.org>.
zhouxiz9 commented on PR #10979:
URL: https://github.com/apache/pinot/pull/10979#issuecomment-1661155624

   > @jtao15 @zhouxiz9 I synced up with @jtao15 yesterday. It looks that the issue that we need to address 2 things:
   > 
   > 1. detect the failure issue earlier than 24 hours.
   > 2. optimize the runtime by only running the failed portion.
   > 
   > It looks that this PR potentially improve 2; however, this may not address the issue that @zhouxiz9 and @jtao15 is currently facing. We will revisit the PR once the root cause of the ongoing is identified.
   
   Hi @snleee, I synced with @jtao15 today and understand that a more comprehensive design is needed to cover the cases such as late events and spill-over. The current fix is only solving part of the problem. I'll close this PR and create a new one once that design is ready.
   
   Since we are facing the minion delay issue (due to repetitively merging already merged segments) in production, a short term fix is needed. I've created another [PR](https://github.com/apache/pinot/pull/11243) to make `MaxAttemptsPerTask` configurable so that we can try to increase this value and to better handle the transient errors. Please let me know if that works.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jtao15 commented on a diff in pull request #10979: Skip generating rollup task with all merged segments when "skipAllMerged" mode is enabled

Posted by "jtao15 (via GitHub)" <gi...@apache.org>.
jtao15 commented on code in PR #10979:
URL: https://github.com/apache/pinot/pull/10979#discussion_r1276536454


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -663,18 +672,26 @@ private List<PinotTaskConfig> createPinotTaskConfigs(List<SegmentZKMetadata> sel
       int numRecordsPerTask = 0;
       List<List<String>> segmentNamesList = new ArrayList<>();
       List<List<String>> downloadURLsList = new ArrayList<>();
+      List<SegmentZKMetadata> segmentZKMetadataList = new ArrayList<>();
       List<String> segmentNames = new ArrayList<>();
       List<String> downloadURLs = new ArrayList<>();
 
       for (int i = 0; i < segments.size(); i++) {
         SegmentZKMetadata targetSegment = segments.get(i);
+        segmentZKMetadataList.add(targetSegment);
         segmentNames.add(targetSegment.getSegmentName());
         downloadURLs.add(targetSegment.getDownloadUrl());
         numRecordsPerTask += targetSegment.getTotalDocs();
         if (numRecordsPerTask >= maxNumRecordsPerTask || i == segments.size() - 1) {
-          segmentNamesList.add(segmentNames);
-          downloadURLsList.add(downloadURLs);
+          // If "skipAllMerged" mode is enabled, skip generating task with all merged segments
+          boolean skipAllMerged = MergeTask.SKIP_ALL_MERGED_MODE.equalsIgnoreCase(taskConfigs.get(MergeTask.MODE));

Review Comment:
   Considering that `processAll` and `skipAllMerged` are not mutually exclusive, a table can have multiple modes. One approach is to treat the task modes as a comma-separated list and then verify if `skipAllMerged` is enabled within that list.(nit) put this line outside of the for loop



##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -663,18 +672,26 @@ private List<PinotTaskConfig> createPinotTaskConfigs(List<SegmentZKMetadata> sel
       int numRecordsPerTask = 0;
       List<List<String>> segmentNamesList = new ArrayList<>();
       List<List<String>> downloadURLsList = new ArrayList<>();
+      List<SegmentZKMetadata> segmentZKMetadataList = new ArrayList<>();
       List<String> segmentNames = new ArrayList<>();
       List<String> downloadURLs = new ArrayList<>();
 
       for (int i = 0; i < segments.size(); i++) {
         SegmentZKMetadata targetSegment = segments.get(i);
+        segmentZKMetadataList.add(targetSegment);
         segmentNames.add(targetSegment.getSegmentName());
         downloadURLs.add(targetSegment.getDownloadUrl());
         numRecordsPerTask += targetSegment.getTotalDocs();
         if (numRecordsPerTask >= maxNumRecordsPerTask || i == segments.size() - 1) {
-          segmentNamesList.add(segmentNames);
-          downloadURLsList.add(downloadURLs);
+          // If "skipAllMerged" mode is enabled, skip generating task with all merged segments
+          boolean skipAllMerged = MergeTask.SKIP_ALL_MERGED_MODE.equalsIgnoreCase(taskConfigs.get(MergeTask.MODE));

Review Comment:
   Considering that `processAll` and `skipAllMerged` are not mutually exclusive, a table can have multiple modes. One approach is to treat the task modes as a comma-separated list and then verify if `skipAllMerged` is enabled within that list.
   
   (nit) put this line outside of the for loop



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] zhouxiz9 commented on pull request #10979: Skip generating rollup task with all merged segments

Posted by "zhouxiz9 (via GitHub)" <gi...@apache.org>.
zhouxiz9 commented on PR #10979:
URL: https://github.com/apache/pinot/pull/10979#issuecomment-1633346951

   @Jackie-Jiang @jtao15 Your concern make sense. I'm thinking we can make this an optional mode like the 'processAll' mode so that people can opt in to skip.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] zhouxiz9 commented on pull request #10979: Skip generating rollup task with all merged segments when "skipAllMerged" mode is enabled

Posted by "zhouxiz9 (via GitHub)" <gi...@apache.org>.
zhouxiz9 commented on PR #10979:
URL: https://github.com/apache/pinot/pull/10979#issuecomment-1646083234

   @jtao15 I've added a new mode "skipAllMerged" so we will only skip generating the task with all merged segments when this mode is enabled. Could you help take a look? Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] snleee commented on pull request #10979: Skip generating rollup task with all merged segments when "skipAllMerged" mode is enabled

Posted by "snleee (via GitHub)" <gi...@apache.org>.
snleee commented on PR #10979:
URL: https://github.com/apache/pinot/pull/10979#issuecomment-1660648865

   @jtao15 @zhouxiz9 I synced up with @jtao15 yesterday. It looks that the issue that we need to decouple 3 things:
   
   1. detect the failure issue earlier than 24 hours.
   2. optimize the runtime by only running the failed portion.
   
   It looks that this PR potentially improve 2; however, this may not address the issue that @zhouxiz9 and @jtao15 is currently facing. We will revisit the PR once the root cause of the ongoing is identified.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jtao15 commented on a diff in pull request #10979: Skip generating rollup task with all merged segments

Posted by "jtao15 (via GitHub)" <gi...@apache.org>.
jtao15 commented on code in PR #10979:
URL: https://github.com/apache/pinot/pull/10979#discussion_r1246891533


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -656,18 +666,25 @@ private List<PinotTaskConfig> createPinotTaskConfigs(List<SegmentZKMetadata> sel
     int numRecordsPerTask = 0;
     List<List<String>> segmentNamesList = new ArrayList<>();
     List<List<String>> downloadURLsList = new ArrayList<>();
+    List<SegmentZKMetadata> segments = new ArrayList<>();
     List<String> segmentNames = new ArrayList<>();
     List<String> downloadURLs = new ArrayList<>();
 
     for (int i = 0; i < selectedSegments.size(); i++) {
       SegmentZKMetadata targetSegment = selectedSegments.get(i);
+      segments.add(targetSegment);
       segmentNames.add(targetSegment.getSegmentName());
       downloadURLs.add(targetSegment.getDownloadUrl());
       numRecordsPerTask += targetSegment.getTotalDocs();
       if (numRecordsPerTask >= maxNumRecordsPerTask || i == selectedSegments.size() - 1) {
-        segmentNamesList.add(segmentNames);
-        downloadURLsList.add(downloadURLs);
+        // Skip generating rollup task with all merged segments
+        if (MergeTaskUtils.getMergeType(mergeConfigs) != MergeType.ROLLUP

Review Comment:
   Is there any reason for wanting to skip tasks specifically for Rollup tasks? Should this apply to concat case as well?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] zhouxiz9 commented on pull request #10979: Skip generating rollup task with all merged segments

Posted by "zhouxiz9 (via GitHub)" <gi...@apache.org>.
zhouxiz9 commented on PR #10979:
URL: https://github.com/apache/pinot/pull/10979#issuecomment-1608456603

   Hi @jtao15 could you help review this change? Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jtao15 commented on pull request #10979: Skip generating rollup task with all merged segments

Posted by "jtao15 (via GitHub)" <gi...@apache.org>.
jtao15 commented on PR #10979:
URL: https://github.com/apache/pinot/pull/10979#issuecomment-1613525537

   @Jackie-Jiang 
   I believe the intention is to introduce a feature that allows skipping merged segments in cases of minion instability. When there is a failed task for a specific bucket and all other tasks succeed, the current code will reschedule all segments for that bucket, including the merged segments. The proposed change aims to use less minion resources by skipping the rescheduling of merged segments in such scenarios, particularly for large tables.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jtao15 commented on pull request #10979: Skip generating rollup task with all merged segments

Posted by "jtao15 (via GitHub)" <gi...@apache.org>.
jtao15 commented on PR #10979:
URL: https://github.com/apache/pinot/pull/10979#issuecomment-1613539959

   In certain cases, such as when input segments consist entirely of spilled-over data and have small sizes, there is a possibility of having fractions due to the proposed change. Do we want to have this skipping feature as an option?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org