You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/10/11 07:00:43 UTC

[GitHub] [pinot] Airliquide76 opened a new pull request, #9568: REALMTIME purge

Airliquide76 opened a new pull request, #9568:
URL: https://github.com/apache/pinot/pull/9568

   feature : ADD realtime time table to task generator in order to purge them properly.
   Only work on "DONE" segements
   
   A bit hard to run on integration tests as segment does not have download URL on clusterIngrationTest.
   
   Open to discussion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Airliquide76 commented on a diff in pull request #9568: REALTIME Table add purge feature

Posted by GitBox <gi...@apache.org>.
Airliquide76 commented on code in PR #9568:
URL: https://github.com/apache/pinot/pull/9568#discussion_r1071082744


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/purge/PurgeTaskGenerator.java:
##########
@@ -150,4 +162,46 @@ public List<PinotTaskConfig> generateTasks(List<TableConfig> tableConfigs) {
     }
     return pinotTaskConfigs;
   }
-}
+
+  /**
+   * Fetch completed (DONE/UPLOADED) segment and partition information
+   *
+   * @param realtimeTableName the realtime table name
+   * @param completedSegmentsZKMetadata list for collecting the completed (DONE/UPLOADED) segments ZK metadata
+   * @param partitionToLatestLLCSegmentName map for collecting the partitionId to the latest LLC segment name
+   * @param allPartitions set for collecting all partition ids
+   */
+  private void getCompletedSegmentsInfo(String realtimeTableName, List<SegmentZKMetadata> completedSegmentsZKMetadata,
+      Map<Integer, String> partitionToLatestLLCSegmentName, Set<Integer> allPartitions) {
+    List<SegmentZKMetadata> segmentsZKMetadata = _clusterInfoAccessor.getSegmentsZKMetadata(realtimeTableName);
+
+    Map<Integer, LLCSegmentName> latestLLCSegmentNameMap = new HashMap<>();
+    for (SegmentZKMetadata segmentZKMetadata : segmentsZKMetadata) {
+      CommonConstants.Segment.Realtime.Status status = segmentZKMetadata.getStatus();
+      if (status.isCompleted()) {
+        completedSegmentsZKMetadata.add(segmentZKMetadata);
+      }
+
+      // Skip UPLOADED segments that don't conform to the LLC segment name

Review Comment:
   Yes I've removed this never used part ! 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Airliquide76 commented on a diff in pull request #9568: REALTIME Table add purge feature

Posted by GitBox <gi...@apache.org>.
Airliquide76 commented on code in PR #9568:
URL: https://github.com/apache/pinot/pull/9568#discussion_r1071082395


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/purge/PurgeTaskGenerator.java:
##########
@@ -92,26 +99,31 @@ public List<PinotTaskConfig> generateTasks(List<TableConfig> tableConfigs) {
       } else {
         tableMaxNumTasks = Integer.MAX_VALUE;
       }
-      List<SegmentZKMetadata> offlineSegmentsZKMetadata = _clusterInfoAccessor.getSegmentsZKMetadata(tableName);
+      List<SegmentZKMetadata> segmentsZKMetadata;

Review Comment:
   It was right. I've removed wat was on function to get segment right on on the if section



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #9568: REALTIME Table add purge feature

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #9568:
URL: https://github.com/apache/pinot/pull/9568#discussion_r995173626


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/purge/PurgeTaskGenerator.java:
##########
@@ -92,26 +99,31 @@ public List<PinotTaskConfig> generateTasks(List<TableConfig> tableConfigs) {
       } else {
         tableMaxNumTasks = Integer.MAX_VALUE;
       }
-      List<SegmentZKMetadata> offlineSegmentsZKMetadata = _clusterInfoAccessor.getSegmentsZKMetadata(tableName);
+      List<SegmentZKMetadata> segmentsZKMetadata;

Review Comment:
   We can read all the segments ZK metadata here, and for real-time table, remove the segments that is not completed



##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/purge/PurgeTaskGenerator.java:
##########
@@ -150,4 +162,46 @@ public List<PinotTaskConfig> generateTasks(List<TableConfig> tableConfigs) {
     }
     return pinotTaskConfigs;
   }
-}
+
+  /**
+   * Fetch completed (DONE/UPLOADED) segment and partition information
+   *
+   * @param realtimeTableName the realtime table name
+   * @param completedSegmentsZKMetadata list for collecting the completed (DONE/UPLOADED) segments ZK metadata
+   * @param partitionToLatestLLCSegmentName map for collecting the partitionId to the latest LLC segment name
+   * @param allPartitions set for collecting all partition ids
+   */
+  private void getCompletedSegmentsInfo(String realtimeTableName, List<SegmentZKMetadata> completedSegmentsZKMetadata,
+      Map<Integer, String> partitionToLatestLLCSegmentName, Set<Integer> allPartitions) {
+    List<SegmentZKMetadata> segmentsZKMetadata = _clusterInfoAccessor.getSegmentsZKMetadata(realtimeTableName);
+
+    Map<Integer, LLCSegmentName> latestLLCSegmentNameMap = new HashMap<>();
+    for (SegmentZKMetadata segmentZKMetadata : segmentsZKMetadata) {
+      CommonConstants.Segment.Realtime.Status status = segmentZKMetadata.getStatus();
+      if (status.isCompleted()) {
+        completedSegmentsZKMetadata.add(segmentZKMetadata);
+      }
+
+      // Skip UPLOADED segments that don't conform to the LLC segment name

Review Comment:
   I don't think we want to skip the uploaded segments even if the name doesn't conform to the LLC segment name



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang merged pull request #9568: REALTIME Table add purge feature

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang merged PR #9568:
URL: https://github.com/apache/pinot/pull/9568


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] GSharayu commented on a diff in pull request #9568: REALTIME Table add purge feature

Posted by GitBox <gi...@apache.org>.
GSharayu commented on code in PR #9568:
URL: https://github.com/apache/pinot/pull/9568#discussion_r1015775872


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/purge/PurgeTaskGenerator.java:
##########
@@ -150,4 +162,46 @@ public List<PinotTaskConfig> generateTasks(List<TableConfig> tableConfigs) {
     }
     return pinotTaskConfigs;
   }
-}
+
+  /**
+   * Fetch completed (DONE/UPLOADED) segment and partition information
+   *
+   * @param realtimeTableName the realtime table name
+   * @param completedSegmentsZKMetadata list for collecting the completed (DONE/UPLOADED) segments ZK metadata
+   * @param partitionToLatestLLCSegmentName map for collecting the partitionId to the latest LLC segment name
+   * @param allPartitions set for collecting all partition ids
+   */
+  private void getCompletedSegmentsInfo(String realtimeTableName, List<SegmentZKMetadata> completedSegmentsZKMetadata,
+      Map<Integer, String> partitionToLatestLLCSegmentName, Set<Integer> allPartitions) {
+    List<SegmentZKMetadata> segmentsZKMetadata = _clusterInfoAccessor.getSegmentsZKMetadata(realtimeTableName);
+
+    Map<Integer, LLCSegmentName> latestLLCSegmentNameMap = new HashMap<>();
+    for (SegmentZKMetadata segmentZKMetadata : segmentsZKMetadata) {
+      CommonConstants.Segment.Realtime.Status status = segmentZKMetadata.getStatus();
+      if (status.isCompleted()) {
+        completedSegmentsZKMetadata.add(segmentZKMetadata);
+      }
+
+      // Skip UPLOADED segments that don't conform to the LLC segment name

Review Comment:
   +1 We want purge jobs to handle real time segments uploaded. cc @sajjad-moradi 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Airliquide76 commented on a diff in pull request #9568: REALTIME Table add purge feature

Posted by GitBox <gi...@apache.org>.
Airliquide76 commented on code in PR #9568:
URL: https://github.com/apache/pinot/pull/9568#discussion_r1071082744


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/purge/PurgeTaskGenerator.java:
##########
@@ -150,4 +162,46 @@ public List<PinotTaskConfig> generateTasks(List<TableConfig> tableConfigs) {
     }
     return pinotTaskConfigs;
   }
-}
+
+  /**
+   * Fetch completed (DONE/UPLOADED) segment and partition information
+   *
+   * @param realtimeTableName the realtime table name
+   * @param completedSegmentsZKMetadata list for collecting the completed (DONE/UPLOADED) segments ZK metadata
+   * @param partitionToLatestLLCSegmentName map for collecting the partitionId to the latest LLC segment name
+   * @param allPartitions set for collecting all partition ids
+   */
+  private void getCompletedSegmentsInfo(String realtimeTableName, List<SegmentZKMetadata> completedSegmentsZKMetadata,
+      Map<Integer, String> partitionToLatestLLCSegmentName, Set<Integer> allPartitions) {
+    List<SegmentZKMetadata> segmentsZKMetadata = _clusterInfoAccessor.getSegmentsZKMetadata(realtimeTableName);
+
+    Map<Integer, LLCSegmentName> latestLLCSegmentNameMap = new HashMap<>();
+    for (SegmentZKMetadata segmentZKMetadata : segmentsZKMetadata) {
+      CommonConstants.Segment.Realtime.Status status = segmentZKMetadata.getStatus();
+      if (status.isCompleted()) {
+        completedSegmentsZKMetadata.add(segmentZKMetadata);
+      }
+
+      // Skip UPLOADED segments that don't conform to the LLC segment name

Review Comment:
   Yes remove this never used part ! 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org