You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "wirybeaver (via GitHub)" <gi...@apache.org> on 2023/06/05 18:05:48 UTC

[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: [Draft] Periodical Tmp Segment file deletion

wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1218414798


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/PinotFSSegmentUploader.java:
##########
@@ -74,7 +73,7 @@ public URI uploadSegment(File segmentFile, LLCSegmentName segmentName, int timeo
     final String rawTableName = TableNameBuilder.extractRawTableName(segmentName.getTableName());
     Callable<URI> uploadTask = () -> {
       URI destUri = new URI(StringUtil.join(File.separator, _segmentStoreUriStr, segmentName.getTableName(),
-          segmentName.getSegmentName() + UUID.randomUUID().toString()));
+          SegmentCompletionUtils.generateSegmentFileName(segmentName.getSegmentName())));

Review Comment:
   Server2ControllerSegmentUploader and PinotFSSegmentUploader has different naming pattern for tmp segments before.
   Server2ControllerSegmentUploader: segment_name.tmp.<UUID>
   PinotFSSegmentUploader: segment_name.<UUID>



##########
pinot-controller/src/main/java/org/apache/pinot/controller/validation/RealtimeSegmentValidationManager.java:
##########
@@ -130,6 +130,13 @@ private void runSegmentLevelValidation(TableConfig tableConfig, PartitionLevelSt
         && _llcRealtimeSegmentManager.isDeepStoreLLCSegmentUploadRetryEnabled()) {
       _llcRealtimeSegmentManager.uploadToDeepStoreIfMissing(tableConfig, segmentsZKMetadata);
     }
+
+    // Delete tmp segments
+    if (streamConfig.hasLowLevelConsumerType()
+        && _llcRealtimeSegmentManager.getIsSplitCommitEnabled()
+        && _llcRealtimeSegmentManager.isTmpSegmentAsyncDeletionEnabled()) {
+      _llcRealtimeSegmentManager.deleteTmpSegments(tableConfig, segmentsZKMetadata);

Review Comment:
   I am hesitating whether to have a separate class for async tmp segments deletion. Pros: the segment re-upload job and async tmp segment deletion can be scheduled at different pace. Cons: If the segment re-upload job failed in the middle, the new tmp segments cannot be cleared up in time



##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/SegmentCompletionUtils.java:
##########
@@ -30,6 +31,8 @@ private SegmentCompletionUtils() {
   private static final Logger LOGGER = LoggerFactory.getLogger(SegmentCompletionUtils.class);
   // Used to create temporary segment file names
   private static final String TMP = ".tmp.";
+  private static final Pattern TMP_FILE = Pattern.compile(
+      "\\.tmp\\.[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$");

Review Comment:
   maybe `[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$"` so that old temp segments created by server also be cleared up?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org