You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/12/06 08:28:47 UTC

[GitHub] [pinot] snleee commented on a diff in pull request #9890: enable MergeRollupTask on realtime tables

snleee commented on code in PR #9890:
URL: https://github.com/apache/pinot/pull/9890#discussion_r1040639632


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -525,7 +579,7 @@ private long getWatermarkMs(long minStartTimeMs, long bucketMs, String mergeLeve
    * Create pinot task configs with selected segments and configs
    */
   private List<PinotTaskConfig> createPinotTaskConfigs(List<SegmentZKMetadata> selectedSegments,
-      String offlineTableName, int maxNumRecordsPerTask, String mergeLevel, List<Integer> partition,
+      String tableName, int maxNumRecordsPerTask, String mergeLevel, List<Integer> partition,

Review Comment:
   `tableNameWithType`



##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -131,22 +146,25 @@ public List<PinotTaskConfig> generateTasks(List<TableConfig> tableConfigs) {
       if (!validate(tableConfig, taskType)) {
         continue;
       }
-      String offlineTableName = tableConfig.getTableName();
-      LOGGER.info("Start generating task configs for table: {} for task: {}", offlineTableName, taskType);
+      String tableName = tableConfig.getTableName();

Review Comment:
   `tableName -> tableNameWithType`



##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -61,13 +63,26 @@
 /**
  * A {@link PinotTaskGenerator} implementation for generating tasks of type {@link MergeRollupTask}
  *
- * TODO: Add the support for realtime table
+ * Assumptions:
+ *  - When the MergeRollupTask starts the first time, records older than the min(now ms, max end time ms of all ready to
+ *    process segments) - bufferTimeMs have already been ingested. If not, newly ingested records older than that time
+ *    may not be properly merged (Due to the latest watermarks advanced too far before records are ingested).
+ *  - If it is needed, there are backfill protocols to ingest and replace records older than the latest watermarks.
+ *    Those protocols can handle time alignment (according to merge levels configurations) correctly.
+ *  - If it is needed, there are reconcile protocols to merge & rollup newly ingested segments that are (1) older than
+ *    the latest watermarks, and (2) not time aligned according to merge levels configurations
+ *    - For realtime tables, those protocols are needed if streaming records arrive late (older thant the latest
+ *      watermarks)
+ *    - For offline tables, those protocols are needed if there are non-time-aligned segments ingested accidentally.
  *
- * Steps:
  *
+ * Steps:
  *  - Pre-select segments:
  *    - Fetch all segments, select segments based on segment lineage (removing segmentsFrom for COMPLETED lineage
  *      entry and segmentsTo for IN_PROGRESS lineage entry)
+ *    - For realtime tables, remove
+ *      - in-progress segments, and

Review Comment:
   Are you referring `IN_PROGRESS` from the segment lineage? If so, are we removing the segment from `segmentFrom` or `segmentTo`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org