You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/12/01 19:04:20 UTC

[GitHub] [pinot] zhtaoxiang opened a new pull request, #9890: enable MergeRollupTask on realtime tables

zhtaoxiang opened a new pull request, #9890:
URL: https://github.com/apache/pinot/pull/9890

   enable MergeRollupTask on realtime tables since now we can push segments to realtime tables.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jtao15 commented on a diff in pull request #9890: enable MergeRollupTask on realtime tables

Posted by GitBox <gi...@apache.org>.
jtao15 commented on code in PR #9890:
URL: https://github.com/apache/pinot/pull/9890#discussion_r1040128407


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -428,20 +446,43 @@ public List<PinotTaskConfig> generateTasks(List<TableConfig> tableConfigs) {
     return pinotTaskConfigs;
   }
 
+  @VisibleForTesting
+  static List<SegmentZKMetadata> filterSegmentsBasedOnStatus(TableType tableType, List<SegmentZKMetadata> allSegments) {
+    if (tableType == TableType.REALTIME) {
+      // For realtime table, filter out
+      // 1. in-progress segments
+      // 2. sealed segments with start time later than the earliest start time of all in progress segments

Review Comment:
   Oh, IC, got it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] snleee commented on a diff in pull request #9890: enable MergeRollupTask on realtime tables

Posted by GitBox <gi...@apache.org>.
snleee commented on code in PR #9890:
URL: https://github.com/apache/pinot/pull/9890#discussion_r1041352399


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -430,22 +448,70 @@ public List<PinotTaskConfig> generateTasks(List<TableConfig> tableConfigs) {
     return pinotTaskConfigs;
   }
 
+  @VisibleForTesting
+  static List<SegmentZKMetadata> filterSegmentsBasedOnStatus(TableType tableType, List<SegmentZKMetadata> allSegments) {
+    if (tableType == TableType.REALTIME) {
+      // For realtime table, don't process
+      // 1. in-progress segments (Segment.Realtime.Status.IN_PROGRESS)
+      // 2. sealed segments with start time later than the earliest start time of all in progress segments
+      // This prevents those in-progress segments from not being merged.
+      //
+      // Note that we make the following two assumptions here:
+      // 1. streaming data consumer lags are negligible
+      // 2. streaming data records are ingested mostly in chronological order (no records are ingested with delay larger
+      //    than bufferTimeMS)
+      //
+      // We don't handle the following cases intentionally because it will be either overkill or too complex
+      // 1. New partition added. If new partitions are not picked up timely, the MergeRollupTask will move watermarks
+      //    forward, and may not be able to merge some lately-created segments for those new partitions -- users should
+      //    configure pinot properly to discover new partitions timely, or they should restart pinot servers manually
+      //    for new partitions to be picked up
+      // 2. (1) no new in-progress segments are created for some partitions (2) new in-progress segments are created for
+      //    partitions, but there is no record consumed (i.e, empty in-progress segments). In those two cases,
+      //    if new records are consumed later, the MergeRollupTask may have already moved watermarks forward, and may
+      //    not be able to merge those lately-created segments -- we assume that users will have a way to backfill those
+      //    records correctly.
+      long earliestStartTimeMsOfInProgressSegments = Long.MAX_VALUE;
+      for (SegmentZKMetadata segmentZKMetadata : allSegments) {
+        if (!segmentZKMetadata.getStatus().isCompleted()
+            && segmentZKMetadata.getTotalDocs() > 0
+            && segmentZKMetadata.getStartTimeMs() < earliestStartTimeMsOfInProgressSegments) {
+          earliestStartTimeMsOfInProgressSegments = segmentZKMetadata.getStartTimeMs();
+        }
+      }
+      final long finalEarliestStartTimeMsOfInProgressSegments = earliestStartTimeMsOfInProgressSegments;
+      return allSegments.stream()
+              .filter(segmentZKMetadata -> segmentZKMetadata.getStatus().isCompleted()
+                  && segmentZKMetadata.getStartTimeMs() < finalEarliestStartTimeMsOfInProgressSegments)
+              .collect(Collectors.toList());
+    } else {
+      return allSegments;
+    }
+  }
+
   /**
    * Validate table config for merge/rollup task
    */
-  private boolean validate(TableConfig tableConfig, String taskType) {
-    String offlineTableName = tableConfig.getTableName();
-    if (tableConfig.getTableType() != TableType.OFFLINE) {
-      LOGGER.warn("Skip generating task: {} for non-OFFLINE table: {}, REALTIME table is not supported yet", taskType,
-          offlineTableName);
-      return false;
-    }
-
+  @VisibleForTesting
+  static boolean validate(TableConfig tableConfig, String taskType) {
+    String tableName = tableConfig.getTableName();

Review Comment:
   `tableNameWithType`? lol :) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] zhtaoxiang commented on a diff in pull request #9890: enable MergeRollupTask on realtime tables

Posted by GitBox <gi...@apache.org>.
zhtaoxiang commented on code in PR #9890:
URL: https://github.com/apache/pinot/pull/9890#discussion_r1043866640


##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MergeRollupMinionClusterIntegrationTest.java:
##########
@@ -65,13 +65,14 @@
 
 
 /**
- * Integration test for minion task of type "MergeRollupTask"
+ * Integration test for minion task of type "MergeRollupTask" configured on offline tables

Review Comment:
   good catch!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] zhtaoxiang commented on a diff in pull request #9890: enable MergeRollupTask on realtime tables

Posted by GitBox <gi...@apache.org>.
zhtaoxiang commented on code in PR #9890:
URL: https://github.com/apache/pinot/pull/9890#discussion_r1040662594


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -61,13 +63,26 @@
 /**
  * A {@link PinotTaskGenerator} implementation for generating tasks of type {@link MergeRollupTask}
  *
- * TODO: Add the support for realtime table
+ * Assumptions:
+ *  - When the MergeRollupTask starts the first time, records older than the min(now ms, max end time ms of all ready to
+ *    process segments) - bufferTimeMs have already been ingested. If not, newly ingested records older than that time
+ *    may not be properly merged (Due to the latest watermarks advanced too far before records are ingested).
+ *  - If it is needed, there are backfill protocols to ingest and replace records older than the latest watermarks.
+ *    Those protocols can handle time alignment (according to merge levels configurations) correctly.
+ *  - If it is needed, there are reconcile protocols to merge & rollup newly ingested segments that are (1) older than
+ *    the latest watermarks, and (2) not time aligned according to merge levels configurations
+ *    - For realtime tables, those protocols are needed if streaming records arrive late (older thant the latest
+ *      watermarks)
+ *    - For offline tables, those protocols are needed if there are non-time-aligned segments ingested accidentally.
  *
- * Steps:
  *
+ * Steps:
  *  - Pre-select segments:
  *    - Fetch all segments, select segments based on segment lineage (removing segmentsFrom for COMPLETED lineage
  *      entry and segmentsTo for IN_PROGRESS lineage entry)
+ *    - For realtime tables, remove
+ *      - in-progress segments, and

Review Comment:
   Nope. The IN_PROGRESS that is defined in Segment.Realtime.Status



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] snleee commented on a diff in pull request #9890: enable MergeRollupTask on realtime tables

Posted by GitBox <gi...@apache.org>.
snleee commented on code in PR #9890:
URL: https://github.com/apache/pinot/pull/9890#discussion_r1041195396


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -61,13 +63,26 @@
 /**
  * A {@link PinotTaskGenerator} implementation for generating tasks of type {@link MergeRollupTask}
  *
- * TODO: Add the support for realtime table
+ * Assumptions:
+ *  - When the MergeRollupTask starts the first time, records older than the min(now ms, max end time ms of all ready to
+ *    process segments) - bufferTimeMs have already been ingested. If not, newly ingested records older than that time
+ *    may not be properly merged (Due to the latest watermarks advanced too far before records are ingested).
+ *  - If it is needed, there are backfill protocols to ingest and replace records older than the latest watermarks.
+ *    Those protocols can handle time alignment (according to merge levels configurations) correctly.
+ *  - If it is needed, there are reconcile protocols to merge & rollup newly ingested segments that are (1) older than
+ *    the latest watermarks, and (2) not time aligned according to merge levels configurations
+ *    - For realtime tables, those protocols are needed if streaming records arrive late (older thant the latest
+ *      watermarks)
+ *    - For offline tables, those protocols are needed if there are non-time-aligned segments ingested accidentally.
  *
- * Steps:
  *
+ * Steps:
  *  - Pre-select segments:
  *    - Fetch all segments, select segments based on segment lineage (removing segmentsFrom for COMPLETED lineage
  *      entry and segmentsTo for IN_PROGRESS lineage entry)
+ *    - For realtime tables, remove
+ *      - in-progress segments, and

Review Comment:
   I saw that you already did! thank you!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] snleee commented on a diff in pull request #9890: enable MergeRollupTask on realtime tables

Posted by GitBox <gi...@apache.org>.
snleee commented on code in PR #9890:
URL: https://github.com/apache/pinot/pull/9890#discussion_r1041194778


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -61,13 +63,26 @@
 /**
  * A {@link PinotTaskGenerator} implementation for generating tasks of type {@link MergeRollupTask}
  *
- * TODO: Add the support for realtime table
+ * Assumptions:
+ *  - When the MergeRollupTask starts the first time, records older than the min(now ms, max end time ms of all ready to
+ *    process segments) - bufferTimeMs have already been ingested. If not, newly ingested records older than that time
+ *    may not be properly merged (Due to the latest watermarks advanced too far before records are ingested).
+ *  - If it is needed, there are backfill protocols to ingest and replace records older than the latest watermarks.
+ *    Those protocols can handle time alignment (according to merge levels configurations) correctly.
+ *  - If it is needed, there are reconcile protocols to merge & rollup newly ingested segments that are (1) older than
+ *    the latest watermarks, and (2) not time aligned according to merge levels configurations
+ *    - For realtime tables, those protocols are needed if streaming records arrive late (older thant the latest
+ *      watermarks)
+ *    - For offline tables, those protocols are needed if there are non-time-aligned segments ingested accidentally.
  *
- * Steps:
  *
+ * Steps:
  *  - Pre-select segments:
  *    - Fetch all segments, select segments based on segment lineage (removing segmentsFrom for COMPLETED lineage
  *      entry and segmentsTo for IN_PROGRESS lineage entry)
+ *    - For realtime tables, remove
+ *      - in-progress segments, and

Review Comment:
   Can you add a bit more explanation to clarify this? Since the line above mentions about lineage state `IN_PROGRESS`. It would be a bit confusing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] snleee commented on a diff in pull request #9890: enable MergeRollupTask on realtime tables

Posted by GitBox <gi...@apache.org>.
snleee commented on code in PR #9890:
URL: https://github.com/apache/pinot/pull/9890#discussion_r1040639632


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -525,7 +579,7 @@ private long getWatermarkMs(long minStartTimeMs, long bucketMs, String mergeLeve
    * Create pinot task configs with selected segments and configs
    */
   private List<PinotTaskConfig> createPinotTaskConfigs(List<SegmentZKMetadata> selectedSegments,
-      String offlineTableName, int maxNumRecordsPerTask, String mergeLevel, List<Integer> partition,
+      String tableName, int maxNumRecordsPerTask, String mergeLevel, List<Integer> partition,

Review Comment:
   `tableNameWithType`



##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -131,22 +146,25 @@ public List<PinotTaskConfig> generateTasks(List<TableConfig> tableConfigs) {
       if (!validate(tableConfig, taskType)) {
         continue;
       }
-      String offlineTableName = tableConfig.getTableName();
-      LOGGER.info("Start generating task configs for table: {} for task: {}", offlineTableName, taskType);
+      String tableName = tableConfig.getTableName();

Review Comment:
   `tableName -> tableNameWithType`



##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -61,13 +63,26 @@
 /**
  * A {@link PinotTaskGenerator} implementation for generating tasks of type {@link MergeRollupTask}
  *
- * TODO: Add the support for realtime table
+ * Assumptions:
+ *  - When the MergeRollupTask starts the first time, records older than the min(now ms, max end time ms of all ready to
+ *    process segments) - bufferTimeMs have already been ingested. If not, newly ingested records older than that time
+ *    may not be properly merged (Due to the latest watermarks advanced too far before records are ingested).
+ *  - If it is needed, there are backfill protocols to ingest and replace records older than the latest watermarks.
+ *    Those protocols can handle time alignment (according to merge levels configurations) correctly.
+ *  - If it is needed, there are reconcile protocols to merge & rollup newly ingested segments that are (1) older than
+ *    the latest watermarks, and (2) not time aligned according to merge levels configurations
+ *    - For realtime tables, those protocols are needed if streaming records arrive late (older thant the latest
+ *      watermarks)
+ *    - For offline tables, those protocols are needed if there are non-time-aligned segments ingested accidentally.
  *
- * Steps:
  *
+ * Steps:
  *  - Pre-select segments:
  *    - Fetch all segments, select segments based on segment lineage (removing segmentsFrom for COMPLETED lineage
  *      entry and segmentsTo for IN_PROGRESS lineage entry)
+ *    - For realtime tables, remove
+ *      - in-progress segments, and

Review Comment:
   Are you referring `IN_PROGRESS` from the segment lineage? If so, are we removing the segment from `segmentFrom` or `segmentTo`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] GSharayu commented on pull request #9890: enable MergeRollupTask on realtime tables

Posted by GitBox <gi...@apache.org>.
GSharayu commented on PR #9890:
URL: https://github.com/apache/pinot/pull/9890#issuecomment-1337899447

   This looks good to me, thanks for the changes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] zhtaoxiang commented on a diff in pull request #9890: enable MergeRollupTask on realtime tables

Posted by GitBox <gi...@apache.org>.
zhtaoxiang commented on code in PR #9890:
URL: https://github.com/apache/pinot/pull/9890#discussion_r1040029118


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -61,13 +63,26 @@
 /**
  * A {@link PinotTaskGenerator} implementation for generating tasks of type {@link MergeRollupTask}
  *
- * TODO: Add the support for realtime table
+ * Assumptions:
+ *  - When the MergeRollupTask starts the first time, records older than the bufferTimeMs have already been ingested.

Review Comment:
   Good point! 
   
   If I  understand correctly, we are using `maxEndTimeMs - bufferTimeMs`. See the method `getValidBucketEndTimeMsForSegment` for details.
   
   I will update the comment.



##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -428,20 +446,43 @@ public List<PinotTaskConfig> generateTasks(List<TableConfig> tableConfigs) {
     return pinotTaskConfigs;
   }
 
+  @VisibleForTesting
+  static List<SegmentZKMetadata> filterSegmentsBasedOnStatus(TableType tableType, List<SegmentZKMetadata> allSegments) {
+    if (tableType == TableType.REALTIME) {
+      // For realtime table, filter out
+      // 1. in-progress segments
+      // 2. sealed segments with start time later than the earliest start time of all in progress segments

Review Comment:
   maybe `filter out` is confusing here? Let me change it to `don't process`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] zhtaoxiang commented on pull request #9890: enable MergeRollupTask on realtime tables

Posted by GitBox <gi...@apache.org>.
zhtaoxiang commented on PR #9890:
URL: https://github.com/apache/pinot/pull/9890#issuecomment-1338096479

   @jtao15 @GSharayu I also added logic to disallow MergeRollup on table with upsert or dedup enabled, please take a second look, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] zhtaoxiang commented on a diff in pull request #9890: enable MergeRollupTask on realtime tables

Posted by GitBox <gi...@apache.org>.
zhtaoxiang commented on code in PR #9890:
URL: https://github.com/apache/pinot/pull/9890#discussion_r1040662594


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -61,13 +63,26 @@
 /**
  * A {@link PinotTaskGenerator} implementation for generating tasks of type {@link MergeRollupTask}
  *
- * TODO: Add the support for realtime table
+ * Assumptions:
+ *  - When the MergeRollupTask starts the first time, records older than the min(now ms, max end time ms of all ready to
+ *    process segments) - bufferTimeMs have already been ingested. If not, newly ingested records older than that time
+ *    may not be properly merged (Due to the latest watermarks advanced too far before records are ingested).
+ *  - If it is needed, there are backfill protocols to ingest and replace records older than the latest watermarks.
+ *    Those protocols can handle time alignment (according to merge levels configurations) correctly.
+ *  - If it is needed, there are reconcile protocols to merge & rollup newly ingested segments that are (1) older than
+ *    the latest watermarks, and (2) not time aligned according to merge levels configurations
+ *    - For realtime tables, those protocols are needed if streaming records arrive late (older thant the latest
+ *      watermarks)
+ *    - For offline tables, those protocols are needed if there are non-time-aligned segments ingested accidentally.
  *
- * Steps:
  *
+ * Steps:
  *  - Pre-select segments:
  *    - Fetch all segments, select segments based on segment lineage (removing segmentsFrom for COMPLETED lineage
  *      entry and segmentsTo for IN_PROGRESS lineage entry)
+ *    - For realtime tables, remove
+ *      - in-progress segments, and

Review Comment:
   Nope. The segments that are defined in Segment.Realtime.Status



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] pjpringle commented on pull request #9890: enable MergeRollupTask on realtime tables

Posted by "pjpringle (via GitHub)" <gi...@apache.org>.
pjpringle commented on PR #9890:
URL: https://github.com/apache/pinot/pull/9890#issuecomment-1501010991

   is there a missing doc update for realtime support here https://docs.pinot.apache.org/operators/operating-pinot/minion-merge-rollup-task
   
   Also doc says multilevel supported but no example config.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #9890: enable MergeRollupTask on realtime tables

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #9890:
URL: https://github.com/apache/pinot/pull/9890#issuecomment-1334328185

   # [Codecov](https://codecov.io/gh/apache/pinot/pull/9890?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#9890](https://codecov.io/gh/apache/pinot/pull/9890?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4b60f4f) into [master](https://codecov.io/gh/apache/pinot/commit/041865a80f7a2359270571a2343049bb1f294fe5?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (041865a) will **decrease** coverage by `0.72%`.
   > The diff coverage is `n/a`.
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #9890      +/-   ##
   ============================================
   - Coverage     68.65%   67.92%   -0.73%     
   - Complexity     5049     5322     +273     
   ============================================
     Files          1973     1473     -500     
     Lines        106008    77322   -28686     
     Branches      16060    12320    -3740     
   ============================================
   - Hits          72775    52520   -20255     
   + Misses        28110    21097    -7013     
   + Partials       5123     3705    -1418     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | unittests1 | `67.92% <ø> (-0.04%)` | :arrow_down: |
   | unittests2 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/9890?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...va/org/apache/pinot/common/config/NettyConfig.java](https://codecov.io/gh/apache/pinot/pull/9890/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vY29uZmlnL05ldHR5Q29uZmlnLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...a/org/apache/pinot/common/metrics/MinionMeter.java](https://codecov.io/gh/apache/pinot/pull/9890/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9NaW5pb25NZXRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...g/apache/pinot/common/metrics/ControllerMeter.java](https://codecov.io/gh/apache/pinot/pull/9890/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9Db250cm9sbGVyTWV0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../apache/pinot/common/metrics/BrokerQueryPhase.java](https://codecov.io/gh/apache/pinot/pull/9890/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9Ccm9rZXJRdWVyeVBoYXNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../apache/pinot/common/metrics/MinionQueryPhase.java](https://codecov.io/gh/apache/pinot/pull/9890/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9NaW5pb25RdWVyeVBoYXNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ache/pinot/server/access/AccessControlFactory.java](https://codecov.io/gh/apache/pinot/pull/9890/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VydmVyL2FjY2Vzcy9BY2Nlc3NDb250cm9sRmFjdG9yeS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...he/pinot/common/messages/TableDeletionMessage.java](https://codecov.io/gh/apache/pinot/pull/9890/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWVzc2FnZXMvVGFibGVEZWxldGlvbk1lc3NhZ2UuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...not/common/exception/HttpErrorStatusException.java](https://codecov.io/gh/apache/pinot/pull/9890/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vZXhjZXB0aW9uL0h0dHBFcnJvclN0YXR1c0V4Y2VwdGlvbi5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...t/core/startree/plan/StarTreeDocIdSetPlanNode.java](https://codecov.io/gh/apache/pinot/pull/9890/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9zdGFydHJlZS9wbGFuL1N0YXJUcmVlRG9jSWRTZXRQbGFuTm9kZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../core/startree/plan/StarTreeTransformPlanNode.java](https://codecov.io/gh/apache/pinot/pull/9890/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9zdGFydHJlZS9wbGFuL1N0YXJUcmVlVHJhbnNmb3JtUGxhbk5vZGUuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [717 more](https://codecov.io/gh/apache/pinot/pull/9890/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] snleee merged pull request #9890: enable MergeRollupTask on realtime tables

Posted by GitBox <gi...@apache.org>.
snleee merged PR #9890:
URL: https://github.com/apache/pinot/pull/9890


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jtao15 commented on a diff in pull request #9890: enable MergeRollupTask on realtime tables

Posted by GitBox <gi...@apache.org>.
jtao15 commented on code in PR #9890:
URL: https://github.com/apache/pinot/pull/9890#discussion_r1041402607


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -61,13 +63,26 @@
 /**
  * A {@link PinotTaskGenerator} implementation for generating tasks of type {@link MergeRollupTask}
  *
- * TODO: Add the support for realtime table
+ * Assumptions:
+ *  - When the MergeRollupTask starts the first time, records older than the bufferTimeMs have already been ingested.

Review Comment:
   Good point, got it now after rethinking about this. Yes, the assumption lgtm now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] zhtaoxiang commented on a diff in pull request #9890: enable MergeRollupTask on realtime tables

Posted by GitBox <gi...@apache.org>.
zhtaoxiang commented on code in PR #9890:
URL: https://github.com/apache/pinot/pull/9890#discussion_r1041479919


##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MergeRollupMinionOfflineClusterIntegrationTest.java:
##########
@@ -65,9 +65,9 @@
 
 
 /**
- * Integration test for minion task of type "MergeRollupTask"
+ * Integration test for minion task of type "MergeRollupTask" configured on offline tables
  */
-public class MergeRollupMinionClusterIntegrationTest extends BaseClusterIntegrationTest {
+public class MergeRollupMinionOfflineClusterIntegrationTest extends BaseClusterIntegrationTest {

Review Comment:
   good point! updated the test case



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] snleee commented on pull request #9890: enable MergeRollupTask on realtime tables

Posted by GitBox <gi...@apache.org>.
snleee commented on PR #9890:
URL: https://github.com/apache/pinot/pull/9890#issuecomment-1343439128

   merging it since the latest change was the comment update.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jtao15 commented on a diff in pull request #9890: enable MergeRollupTask on realtime tables

Posted by GitBox <gi...@apache.org>.
jtao15 commented on code in PR #9890:
URL: https://github.com/apache/pinot/pull/9890#discussion_r1040111143


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -61,13 +63,26 @@
 /**
  * A {@link PinotTaskGenerator} implementation for generating tasks of type {@link MergeRollupTask}
  *
- * TODO: Add the support for realtime table
+ * Assumptions:
+ *  - When the MergeRollupTask starts the first time, records older than the bufferTimeMs have already been ingested.

Review Comment:
   `getValidBucketEndTimeMsForSegment` is only used for the delay metrics. When we validate the target bucket in `isValidBucketEndTime`, we do the following:
   ```
       // Check that bucketEndMs <= now - bufferMs
       if (bucketEndMs > System.currentTimeMillis() - bufferMs) {
         return false;
       }
   ```
   If we use maxEndTimeMs of segments instead of `now`, we won't have the issue that `latest watermarks advanced too far before records are ingested`? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jtao15 commented on a diff in pull request #9890: enable MergeRollupTask on realtime tables

Posted by GitBox <gi...@apache.org>.
jtao15 commented on code in PR #9890:
URL: https://github.com/apache/pinot/pull/9890#discussion_r1039936458


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -61,13 +63,26 @@
 /**
  * A {@link PinotTaskGenerator} implementation for generating tasks of type {@link MergeRollupTask}
  *
- * TODO: Add the support for realtime table
+ * Assumptions:
+ *  - When the MergeRollupTask starts the first time, records older than the bufferTimeMs have already been ingested.

Review Comment:
   Currently, we make sure `bucketEndMs <= now - bufferTimeMs` to process the time bucket. What if we use maxEndTimeMs of valid segments instead of now? Will it help the bootstrap and pause/resume case?



##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -428,20 +446,43 @@ public List<PinotTaskConfig> generateTasks(List<TableConfig> tableConfigs) {
     return pinotTaskConfigs;
   }
 
+  @VisibleForTesting
+  static List<SegmentZKMetadata> filterSegmentsBasedOnStatus(TableType tableType, List<SegmentZKMetadata> allSegments) {
+    if (tableType == TableType.REALTIME) {
+      // For realtime table, filter out
+      // 1. in-progress segments
+      // 2. sealed segments with start time later than the earliest start time of all in progress segments

Review Comment:
   (nit) sealed segments with start time `earlier` than?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] zhtaoxiang commented on a diff in pull request #9890: enable MergeRollupTask on realtime tables

Posted by GitBox <gi...@apache.org>.
zhtaoxiang commented on code in PR #9890:
URL: https://github.com/apache/pinot/pull/9890#discussion_r1040144315


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -61,13 +63,26 @@
 /**
  * A {@link PinotTaskGenerator} implementation for generating tasks of type {@link MergeRollupTask}
  *
- * TODO: Add the support for realtime table
+ * Assumptions:
+ *  - When the MergeRollupTask starts the first time, records older than the bufferTimeMs have already been ingested.

Review Comment:
   1. I agree that the code I pointed out is only used by metrics
   2. the code snippet you posted here is used to prevent bucketEndMs from moving too far
   
   The real progress is managed by the `watermark`, it will not move too far: https://github.com/apache/pinot/blob/master/pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java#L342. This value will not excceed the max end time ms of all ready to process segments.
   
   What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] snleee commented on a diff in pull request #9890: enable MergeRollupTask on realtime tables

Posted by GitBox <gi...@apache.org>.
snleee commented on code in PR #9890:
URL: https://github.com/apache/pinot/pull/9890#discussion_r1043864051


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -430,22 +448,70 @@ public List<PinotTaskConfig> generateTasks(List<TableConfig> tableConfigs) {
     return pinotTaskConfigs;
   }
 
+  @VisibleForTesting
+  static List<SegmentZKMetadata> filterSegmentsBasedOnStatus(TableType tableType, List<SegmentZKMetadata> allSegments) {
+    if (tableType == TableType.REALTIME) {
+      // For realtime table, don't process
+      // 1. in-progress segments (Segment.Realtime.Status.IN_PROGRESS)
+      // 2. sealed segments with start time later than the earliest start time of all in progress segments
+      // This prevents those in-progress segments from not being merged.
+      //
+      // Note that we make the following two assumptions here:
+      // 1. streaming data consumer lags are negligible
+      // 2. streaming data records are ingested mostly in chronological order (no records are ingested with delay larger
+      //    than bufferTimeMS)
+      //
+      // We don't handle the following cases intentionally because it will be either overkill or too complex
+      // 1. New partition added. If new partitions are not picked up timely, the MergeRollupTask will move watermarks
+      //    forward, and may not be able to merge some lately-created segments for those new partitions -- users should
+      //    configure pinot properly to discover new partitions timely, or they should restart pinot servers manually
+      //    for new partitions to be picked up
+      // 2. (1) no new in-progress segments are created for some partitions (2) new in-progress segments are created for
+      //    partitions, but there is no record consumed (i.e, empty in-progress segments). In those two cases,
+      //    if new records are consumed later, the MergeRollupTask may have already moved watermarks forward, and may
+      //    not be able to merge those lately-created segments -- we assume that users will have a way to backfill those
+      //    records correctly.
+      long earliestStartTimeMsOfInProgressSegments = Long.MAX_VALUE;
+      for (SegmentZKMetadata segmentZKMetadata : allSegments) {
+        if (!segmentZKMetadata.getStatus().isCompleted()
+            && segmentZKMetadata.getTotalDocs() > 0
+            && segmentZKMetadata.getStartTimeMs() < earliestStartTimeMsOfInProgressSegments) {
+          earliestStartTimeMsOfInProgressSegments = segmentZKMetadata.getStartTimeMs();
+        }
+      }
+      final long finalEarliestStartTimeMsOfInProgressSegments = earliestStartTimeMsOfInProgressSegments;
+      return allSegments.stream()
+              .filter(segmentZKMetadata -> segmentZKMetadata.getStatus().isCompleted()
+                  && segmentZKMetadata.getStartTimeMs() < finalEarliestStartTimeMsOfInProgressSegments)
+              .collect(Collectors.toList());
+    } else {
+      return allSegments;
+    }
+  }
+
   /**
    * Validate table config for merge/rollup task
    */
-  private boolean validate(TableConfig tableConfig, String taskType) {
-    String offlineTableName = tableConfig.getTableName();
-    if (tableConfig.getTableType() != TableType.OFFLINE) {
-      LOGGER.warn("Skip generating task: {} for non-OFFLINE table: {}, REALTIME table is not supported yet", taskType,
-          offlineTableName);
-      return false;
-    }
-
+  @VisibleForTesting
+  static boolean validate(TableConfig tableConfig, String taskType) {
+    String tableNameWithType = tableConfig.getTableName();
     if (REFRESH.equalsIgnoreCase(IngestionConfigUtils.getBatchSegmentIngestionType(tableConfig))) {
       LOGGER.warn("Skip generating task: {} for non-APPEND table: {}, REFRESH table is not supported", taskType,
-          offlineTableName);
+          tableNameWithType);
       return false;
     }
+    if (tableConfig.getTableType() == TableType.REALTIME) {
+      if (tableConfig.isUpsertEnabled()) {

Review Comment:
   +1 Thanks for adding this



##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MergeRollupMinionClusterIntegrationTest.java:
##########
@@ -65,13 +65,14 @@
 
 
 /**
- * Integration test for minion task of type "MergeRollupTask"
+ * Integration test for minion task of type "MergeRollupTask" configured on offline tables

Review Comment:
   Update the comment to remove `offline tables` since we combined the test :) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] snleee commented on a diff in pull request #9890: enable MergeRollupTask on realtime tables

Posted by GitBox <gi...@apache.org>.
snleee commented on code in PR #9890:
URL: https://github.com/apache/pinot/pull/9890#discussion_r1041373092


##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MergeRollupMinionOfflineClusterIntegrationTest.java:
##########
@@ -65,9 +65,9 @@
 
 
 /**
- * Integration test for minion task of type "MergeRollupTask"
+ * Integration test for minion task of type "MergeRollupTask" configured on offline tables
  */
-public class MergeRollupMinionClusterIntegrationTest extends BaseClusterIntegrationTest {
+public class MergeRollupMinionOfflineClusterIntegrationTest extends BaseClusterIntegrationTest {

Review Comment:
   By the way, is it possible to merge offline/realtime tests using 1 integration test since we are based on `BaseClusterIntegrationTest` for both tests? Integration test is expensive to set up the env.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org