You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "wirybeaver (via GitHub)" <gi...@apache.org> on 2023/05/29 16:56:49 UTC

[GitHub] [pinot] wirybeaver opened a new pull request, #10815: Periodical Tmp Segment file deletion

wirybeaver opened a new pull request, #10815:
URL: https://github.com/apache/pinot/pull/10815

   [issue #10243](https://github.com/apache/pinot/issues/10243)
   - [ X ] Unify the tmp file naming format
   - [ ] Delete tmp files at a regular cadence by extending the ControllerPeriodicTask
   
   ### Unify the tmp file naming format
   In the 2nd phase (Segment Upload) of segment split commit, Server2ControllerSegmentUploader and PinotFSSegmentUploader diverges on the naming pattern of temporary segment file, which are <segmentName>.tmp.<UUID> and <segmentName><UUID> respectively. In the 3rd phase (CommitEnd), the controller moves the temporary file to permanent file and then synchronously delete other potential temporary files that were incidentally generated due to the flaky network. However, the deletion currently only filters tmp files matches Server2ControllerSegmentUploader's pattern. Thus, we need to unify the tmp file pattern for both uploaders. 
   
   ### Periodic File Deletion
   Add a clusters level knob to turn on async tmp file deletion. If it's turned on, the time consuming FS.listFiles() won't be invoked and thus increase the throughput of realtime consuming. Delete tmp files that are out of a configurable retention, e.g., 10mins.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] ankitsultana commented on pull request #10815: [Draft] Periodical Tmp Segment file deletion

Posted by "ankitsultana (via GitHub)" <gi...@apache.org>.
ankitsultana commented on PR #10815:
URL: https://github.com/apache/pinot/pull/10815#issuecomment-1612406620

   > Do we need to clean up tmp files periodically, or just when the controller starts?
   
   To add to @wirybeaver 's response, we will need to add a periodic job for this clean-up. This PR is corresponding to this doc we had written a while back: https://docs.google.com/document/u/1/d/18PqeN7eUP9Yc2MqwihSrGMmadJ3nuDsrcWf6F_ks_hQ/edit
   
   In our internal clusters, we found that 50% of all deep-store data for realtime tables was just leaked data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1310675757


##########
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java:
##########
@@ -215,9 +215,14 @@ public static class ControllerPeriodicTasksConf {
         "controller.realtime.segment.deepStoreUploadRetryEnabled";
     public static final String DEEP_STORE_RETRY_UPLOAD_TIMEOUT_MS =
         "controller.realtime.segment.deepStoreUploadRetry.timeoutMs";
+    public static final String ENABLE_TMP_SEGMENT_ASYNC_DELETION =
+        "controller.realtime.split.commit.segment.tmp.cleanup.async.enable";
+    public static final String SPLIT_COMMIT_TMP_SEGMENT_LIFETIME_SECOND =
+        "controller.realtime.split.commit.segment.tmp.lifetime.second";
 
     public static final int MIN_INITIAL_DELAY_IN_SECONDS = 120;
     public static final int MAX_INITIAL_DELAY_IN_SECONDS = 300;
+    public static final int DEFAULT_SPLIT_COMMIT_TMP_SEGMENT_LIFETIME_SECOND = 600;

Review Comment:
   make sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1323877297


##########
pinot-common/src/main/java/org/apache/pinot/common/metrics/ValidationMetrics.java:
##########
@@ -239,6 +239,14 @@ public void cleanupSegmentCountGauge(final String resource) {
     removeGauge(resource, ValidationMetricName.SEGMENT_COUNT);
   }
 
+  public void updateTmpSegmentCountGauge(final String resource, final long tmpSegmentCount) {

Review Comment:
   ```  
   public static <T> PinotGauge<T> makeGauge(PinotMetricsRegistry registry, PinotMetricName name, PinotGauge<T> gauge) {
       return registry.newGauge(name, gauge);
     }
   
     public static PinotMeter makePinotMeter(PinotMetricsRegistry registry, PinotMetricName name, String eventType,
         TimeUnit unit) {
       return registry.newMeter(name, eventType, unit);
     }
   ```
   Meter metrics would output .m1, .m5, .m15 I am not whether we need those statics. Existing count related metrics of ValidationMetrics also use gauge



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1327847249


##########
pinot-common/src/main/java/org/apache/pinot/common/metrics/ValidationMetrics.java:
##########
@@ -239,6 +239,14 @@ public void cleanupSegmentCountGauge(final String resource) {
     removeGauge(resource, ValidationMetricName.SEGMENT_COUNT);
   }
 
+  public void updateTmpSegmentCountGauge(final String resource, final long tmpSegmentCount) {

Review Comment:
   @Jackie-Jiang  Could you have another review?



##########
pinot-common/src/main/java/org/apache/pinot/common/metrics/ValidationMetrics.java:
##########
@@ -239,6 +239,14 @@ public void cleanupSegmentCountGauge(final String resource) {
     removeGauge(resource, ValidationMetricName.SEGMENT_COUNT);
   }
 
+  public void updateTmpSegmentCountGauge(final String resource, final long tmpSegmentCount) {

Review Comment:
   Appreciate for sharing the knowledge. I already replaced gauge with meter and add removeTableMeter method into AbstractMetrics.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: [Draft] Periodical Tmp Segment file deletion

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1218415965


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/SegmentCompletionUtils.java:
##########
@@ -30,6 +31,8 @@ private SegmentCompletionUtils() {
   private static final Logger LOGGER = LoggerFactory.getLogger(SegmentCompletionUtils.class);
   // Used to create temporary segment file names
   private static final String TMP = ".tmp.";
+  private static final Pattern TMP_FILE = Pattern.compile(
+      "\\.tmp\\.[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$");

Review Comment:
   maybe switched to `"[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$"` so that old temp segments created by server also be cleared up?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on PR #10815:
URL: https://github.com/apache/pinot/pull/10815#issuecomment-1644330272

   > always enable this periodic task and remove the sync cleanup logic
   I will set the default value to true.
   
   >  I feel this clean up logic can be fit into the existing RealtimeSegmentValidationManager
   In the initial version, I append the clean up task at the end of validation manager. I am good to rollback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1299511894


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1465,6 +1469,65 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe
     }
   }
 
+  public long deleteTmpSegments(String tableNameWithType) {
+    Preconditions.checkState(!_isStopping, "Segment manager is stopping");
+
+    if (!TableNameBuilder.isRealtimeTableResource(tableNameWithType)) {
+      return 0L;
+    }
+
+    TableConfig tableConfig = _helixResourceManager.getTableConfig(tableNameWithType);
+    if (tableConfig == null) {
+      LOGGER.warn("Failed to find table config for table: {}, skipping deletion of tmp segments", tableNameWithType);
+      return 0L;
+    }
+
+    if (!isLowLevelConsumer(tableNameWithType, tableConfig)
+        || !getIsSplitCommitEnabled()
+        || !isTmpSegmentAsyncDeletionEnabled()) {
+      return 0L;
+    }
+
+    // Delete tmp segments for realtime table with low level consumer, split commit and async deletion is enabled.
+    List<SegmentZKMetadata> segmentsZKMetadata = _helixResourceManager.getSegmentsZKMetadata(tableNameWithType);
+    Set<String> deepURIs = segmentsZKMetadata.stream().parallel().filter(meta -> meta.getStatus() == Status.DONE
+        && !CommonConstants.Segment.METADATA_URI_FOR_PEER_DOWNLOAD.equals(meta.getDownloadUrl())).map(
+        SegmentZKMetadata::getDownloadUrl).collect(
+        Collectors.toSet());
+
+    String rawTableName = TableNameBuilder.extractRawTableName(tableNameWithType);
+    URI tableDirURI = URIUtils.getUri(_controllerConf.getDataDir(), rawTableName);
+    PinotFS pinotFS = PinotFSFactory.create(tableDirURI.getScheme());
+    long orphanTmpSegments = 0;
+    try {
+      for (String filePath : pinotFS.listFiles(tableDirURI, false)) {
+        // prepend scheme
+        URI uri = URIUtils.getUri(filePath);
+        if (isTmpAndCanDelete(uri, deepURIs, pinotFS)) {
+          LOGGER.info("Deleting temporary segment file: {}", uri);
+          Preconditions.checkState(pinotFS.delete(uri, true), "Failed to delete file: %s", uri);
+          orphanTmpSegments++;
+        }
+      }
+    } catch (Exception e) {
+      LOGGER.warn("Caught exception while deleting temporary files for table: {}", rawTableName, e);
+    }
+    return orphanTmpSegments;
+  }
+
+  private boolean isTmpAndCanDelete(URI uri, Set<String> deepURIs, PinotFS pinotFS) throws Exception {
+    long lastModified = pinotFS.lastModified(uri);
+    String uriString = uri.toString();
+    return SegmentCompletionUtils.isTmpFile(uriString) && !deepURIs.contains(uriString)

Review Comment:
   Can you fix the links internally by moving the files to the correct location and update the download urls? The deep store url for each segment should be fixed (`<dataDir>/<tableName>/<segmentName>`), and I don't know the side effect of having random url



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1320445951


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1465,6 +1469,72 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe
     }
   }
 
+  public long deleteTmpSegments(String tableNameWithType, List<SegmentZKMetadata> segmentsZKMetadata) {
+    Preconditions.checkState(!_isStopping, "Segment manager is stopping");
+
+    if (!TableNameBuilder.isRealtimeTableResource(tableNameWithType)) {
+      return 0L;
+    }
+
+    TableConfig tableConfig = _helixResourceManager.getTableConfig(tableNameWithType);
+    if (tableConfig == null) {
+      LOGGER.warn("Failed to find table config for table: {}, skipping deletion of tmp segments", tableNameWithType);
+      return 0L;
+    }
+
+    if (!isLowLevelConsumer(tableNameWithType, tableConfig)
+        || !getIsSplitCommitEnabled()
+        || !isTmpSegmentAsyncDeletionEnabled()) {
+      return 0L;
+    }
+
+    // Delete tmp segments for realtime table with low level consumer, split commit and async deletion is enabled.
+    Set<String> deepURIs = segmentsZKMetadata.stream().filter(meta -> meta.getStatus() == Status.DONE
+        && !CommonConstants.Segment.METADATA_URI_FOR_PEER_DOWNLOAD.equals(meta.getDownloadUrl())).map(

Review Comment:
   I think you might be confused by the comment above. METADATA_URI_FOR_PEER_DOWNLOAD is used to select deepURIs rather than judging the SplitCommit enabled or not. SplitCommitEnabled is already judged before
   ```
       if (!isLowLevelConsumer(tableNameWithType, tableConfig)
           || !getIsSplitCommitEnabled()
           || !isTmpSegmentAsyncDeletionEnabled()) {
         return 0L;
       }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] chenboat commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "chenboat (via GitHub)" <gi...@apache.org>.
chenboat commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1320443028


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1465,6 +1469,72 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe
     }
   }
 
+  public long deleteTmpSegments(String tableNameWithType, List<SegmentZKMetadata> segmentsZKMetadata) {
+    Preconditions.checkState(!_isStopping, "Segment manager is stopping");
+
+    if (!TableNameBuilder.isRealtimeTableResource(tableNameWithType)) {
+      return 0L;
+    }
+
+    TableConfig tableConfig = _helixResourceManager.getTableConfig(tableNameWithType);
+    if (tableConfig == null) {
+      LOGGER.warn("Failed to find table config for table: {}, skipping deletion of tmp segments", tableNameWithType);
+      return 0L;
+    }
+
+    if (!isLowLevelConsumer(tableNameWithType, tableConfig)
+        || !getIsSplitCommitEnabled()
+        || !isTmpSegmentAsyncDeletionEnabled()) {
+      return 0L;
+    }
+
+    // Delete tmp segments for realtime table with low level consumer, split commit and async deletion is enabled.
+    Set<String> deepURIs = segmentsZKMetadata.stream().filter(meta -> meta.getStatus() == Status.DONE
+        && !CommonConstants.Segment.METADATA_URI_FOR_PEER_DOWNLOAD.equals(meta.getDownloadUrl())).map(

Review Comment:
   peer download config is different from split commit config. Split commit enabled or not is a separate server config.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1262807636


##########
pinot-common/src/main/java/org/apache/pinot/common/metrics/ValidationMetrics.java:
##########
@@ -239,6 +239,14 @@ public void cleanupSegmentCountGauge(final String resource) {
     removeGauge(resource, ValidationMetricName.SEGMENT_COUNT);
   }
 
+  public void updateTmpSegCountGauge(final String resource, final long tmpSegmentCount) {
+    makeGauge(resource, ValidationMetricName.DELETED_TMP_SEGMENT_COUNT, _storedValueGaugeFactory, tmpSegmentCount);
+  }
+
+  public void cleanupTmpSegCountGauge(final String resource) {
+    removeGauge(resource, ValidationMetricName.SEGMENT_COUNT);

Review Comment:
   yeah



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1299510571


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1465,6 +1469,65 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe
     }
   }
 
+  public long deleteTmpSegments(String tableNameWithType) {
+    Preconditions.checkState(!_isStopping, "Segment manager is stopping");
+
+    if (!TableNameBuilder.isRealtimeTableResource(tableNameWithType)) {
+      return 0L;
+    }
+
+    TableConfig tableConfig = _helixResourceManager.getTableConfig(tableNameWithType);
+    if (tableConfig == null) {
+      LOGGER.warn("Failed to find table config for table: {}, skipping deletion of tmp segments", tableNameWithType);
+      return 0L;
+    }
+
+    if (!isLowLevelConsumer(tableNameWithType, tableConfig)
+        || !getIsSplitCommitEnabled()
+        || !isTmpSegmentAsyncDeletionEnabled()) {
+      return 0L;
+    }
+
+    // Delete tmp segments for realtime table with low level consumer, split commit and async deletion is enabled.
+    List<SegmentZKMetadata> segmentsZKMetadata = _helixResourceManager.getSegmentsZKMetadata(tableNameWithType);
+    Set<String> deepURIs = segmentsZKMetadata.stream().parallel().filter(meta -> meta.getStatus() == Status.DONE
+        && !CommonConstants.Segment.METADATA_URI_FOR_PEER_DOWNLOAD.equals(meta.getDownloadUrl())).map(
+        SegmentZKMetadata::getDownloadUrl).collect(
+        Collectors.toSet());
+
+    String rawTableName = TableNameBuilder.extractRawTableName(tableNameWithType);
+    URI tableDirURI = URIUtils.getUri(_controllerConf.getDataDir(), rawTableName);
+    PinotFS pinotFS = PinotFSFactory.create(tableDirURI.getScheme());
+    long orphanTmpSegments = 0;
+    try {
+      for (String filePath : pinotFS.listFiles(tableDirURI, false)) {
+        // prepend scheme
+        URI uri = URIUtils.getUri(filePath);
+        if (isTmpAndCanDelete(uri, deepURIs, pinotFS)) {
+          LOGGER.info("Deleting temporary segment file: {}", uri);
+          Preconditions.checkState(pinotFS.delete(uri, true), "Failed to delete file: %s", uri);
+          orphanTmpSegments++;
+        }
+      }
+    } catch (Exception e) {
+      LOGGER.warn("Caught exception while deleting temporary files for table: {}", rawTableName, e);
+    }
+    return orphanTmpSegments;
+  }
+
+  private boolean isTmpAndCanDelete(URI uri, Set<String> deepURIs, PinotFS pinotFS) throws Exception {
+    long lastModified = pinotFS.lastModified(uri);
+    String uriString = uri.toString();
+    return SegmentCompletionUtils.isTmpFile(uriString) && !deepURIs.contains(uriString)

Review Comment:
   Oh my... We need to fix that. Currently we are forced to read all realtime segment metadata



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] chenboat commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "chenboat (via GitHub)" <gi...@apache.org>.
chenboat commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1320444625


##########
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java:
##########
@@ -215,9 +215,14 @@ public static class ControllerPeriodicTasksConf {
         "controller.realtime.segment.deepStoreUploadRetryEnabled";
     public static final String DEEP_STORE_RETRY_UPLOAD_TIMEOUT_MS =
         "controller.realtime.segment.deepStoreUploadRetry.timeoutMs";
+    public static final String ENABLE_TMP_SEGMENT_ASYNC_DELETION =
+        "controller.realtime.split.commit.segment.tmp.cleanup.async.enable";
+    public static final String SPLIT_COMMIT_TMP_SEGMENT_LIFETIME_SECOND =

Review Comment:
   NIT: SECOND --> SECONDS



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on PR #10815:
URL: https://github.com/apache/pinot/pull/10815#issuecomment-1642964875

   What I meant is to always enable this periodic task and remove the sync cleanup logic.
   Adding a dedicated periodic task per specific job is more modular, but we also need to consider if it can cause problem when running in parallel with another periodic task (e.g. we had encountered some problems when 2 periodic task working on the same table at the same time). I feel this clean up logic can be fit into the existing `RealtimeSegmentValidationManager`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on pull request #10815: [Draft] Periodical Tmp Segment file deletion

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on PR #10815:
URL: https://github.com/apache/pinot/pull/10815#issuecomment-1612220804

   > Do we need to clean up tmp files periodically, or just when the controller starts? Will controller create extra tmp files during the regular life cycle without shut down?
   
   Hi Jackie, there are two kind of segment uploader during SegmentUpload phase for split commit.
   1. Server2ControllerSegmentUploader, controller upload the tmp file <seg_name>.tmp.<uuid> to deep store.
   2. PinotFSSegmentUploader, server upload the tmp file <seg_name><uuid> to deep store.
   
   In the CommitEnd phase, the deep storage can have outage due to flaky network when controller attempt to rename the tmp file. If peer downloading is enabled, controller will continue to commit the segment in ZK. The tmp file is left there in the regular lifecycle.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1300880784


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1465,6 +1469,65 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe
     }
   }
 
+  public long deleteTmpSegments(String tableNameWithType) {
+    Preconditions.checkState(!_isStopping, "Segment manager is stopping");
+
+    if (!TableNameBuilder.isRealtimeTableResource(tableNameWithType)) {
+      return 0L;
+    }
+
+    TableConfig tableConfig = _helixResourceManager.getTableConfig(tableNameWithType);
+    if (tableConfig == null) {
+      LOGGER.warn("Failed to find table config for table: {}, skipping deletion of tmp segments", tableNameWithType);
+      return 0L;
+    }
+
+    if (!isLowLevelConsumer(tableNameWithType, tableConfig)
+        || !getIsSplitCommitEnabled()
+        || !isTmpSegmentAsyncDeletionEnabled()) {
+      return 0L;
+    }
+
+    // Delete tmp segments for realtime table with low level consumer, split commit and async deletion is enabled.
+    List<SegmentZKMetadata> segmentsZKMetadata = _helixResourceManager.getSegmentsZKMetadata(tableNameWithType);

Review Comment:
   At the beginning of RealtimeSegmentValidationManager.runSegmentLevelValidation(), the zk meta is dumped already, so I will pass the zk into the deleteTmpSegment function instead of read again



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1323785121


##########
pinot-common/src/main/java/org/apache/pinot/common/metrics/ValidationMetrics.java:
##########
@@ -239,6 +239,14 @@ public void cleanupSegmentCountGauge(final String resource) {
     removeGauge(resource, ValidationMetricName.SEGMENT_COUNT);
   }
 
+  public void updateTmpSegmentCountGauge(final String resource, final long tmpSegmentCount) {

Review Comment:
   ```  
   public enum ValidationMetricName {
       MISSING_SEGMENT_COUNT("missingSegmentCount"),
       OFFLINE_SEGMENT_DELAY_HOURS("offlineSegmentDelayHours"),
       LAST_PUSH_TIME_DELAY_HOURS("lastPushTimeDelayHours"),
       TOTAL_DOCUMENT_COUNT("TotalDocumentCount"),
       NON_CONSUMING_PARTITION_COUNT("NonConsumingPartitionCount"),
       SEGMENT_COUNT("SegmentCount"),
       DELETED_TMP_SEGMENT_COUNT("DeletedTmpSegmentCount");
   ```
   It seems that all count metrics are using Gauge



##########
pinot-common/src/main/java/org/apache/pinot/common/metrics/ValidationMetrics.java:
##########
@@ -239,6 +239,14 @@ public void cleanupSegmentCountGauge(final String resource) {
     removeGauge(resource, ValidationMetricName.SEGMENT_COUNT);
   }
 
+  public void updateTmpSegmentCountGauge(final String resource, final long tmpSegmentCount) {

Review Comment:
   Shall we use Count?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1299527276


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1465,6 +1469,65 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe
     }
   }
 
+  public long deleteTmpSegments(String tableNameWithType) {
+    Preconditions.checkState(!_isStopping, "Segment manager is stopping");
+
+    if (!TableNameBuilder.isRealtimeTableResource(tableNameWithType)) {
+      return 0L;
+    }
+
+    TableConfig tableConfig = _helixResourceManager.getTableConfig(tableNameWithType);
+    if (tableConfig == null) {
+      LOGGER.warn("Failed to find table config for table: {}, skipping deletion of tmp segments", tableNameWithType);
+      return 0L;
+    }
+
+    if (!isLowLevelConsumer(tableNameWithType, tableConfig)
+        || !getIsSplitCommitEnabled()
+        || !isTmpSegmentAsyncDeletionEnabled()) {
+      return 0L;
+    }
+
+    // Delete tmp segments for realtime table with low level consumer, split commit and async deletion is enabled.
+    List<SegmentZKMetadata> segmentsZKMetadata = _helixResourceManager.getSegmentsZKMetadata(tableNameWithType);
+    Set<String> deepURIs = segmentsZKMetadata.stream().parallel().filter(meta -> meta.getStatus() == Status.DONE
+        && !CommonConstants.Segment.METADATA_URI_FOR_PEER_DOWNLOAD.equals(meta.getDownloadUrl())).map(
+        SegmentZKMetadata::getDownloadUrl).collect(
+        Collectors.toSet());
+
+    String rawTableName = TableNameBuilder.extractRawTableName(tableNameWithType);
+    URI tableDirURI = URIUtils.getUri(_controllerConf.getDataDir(), rawTableName);
+    PinotFS pinotFS = PinotFSFactory.create(tableDirURI.getScheme());
+    long orphanTmpSegments = 0;
+    try {
+      for (String filePath : pinotFS.listFiles(tableDirURI, false)) {
+        // prepend scheme
+        URI uri = URIUtils.getUri(filePath);
+        if (isTmpAndCanDelete(uri, deepURIs, pinotFS)) {
+          LOGGER.info("Deleting temporary segment file: {}", uri);
+          Preconditions.checkState(pinotFS.delete(uri, true), "Failed to delete file: %s", uri);
+          orphanTmpSegments++;
+        }
+      }
+    } catch (Exception e) {
+      LOGGER.warn("Caught exception while deleting temporary files for table: {}", rawTableName, e);
+    }
+    return orphanTmpSegments;
+  }
+
+  private boolean isTmpAndCanDelete(URI uri, Set<String> deepURIs, PinotFS pinotFS) throws Exception {
+    long lastModified = pinotFS.lastModified(uri);
+    String uriString = uri.toString();
+    return SegmentCompletionUtils.isTmpFile(uriString) && !deepURIs.contains(uriString)

Review Comment:
   Actually, the Uber's existing realtime tmp file name follows the patten <dataDir>/<tableName>/<segmentName><UUID> instead of <dataDir>/<tableName>/<segmentName>.tmp.<UUID>. It's safe to skip the down url checking and zk metadata read.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1295492501


##########
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java:
##########
@@ -921,10 +953,19 @@ public boolean isDeepStoreRetryUploadLLCSegmentEnabled() {
     return getProperty(ControllerPeriodicTasksConf.ENABLE_DEEP_STORE_RETRY_UPLOAD_LLC_SEGMENT, false);
   }
 
+  public boolean isTmpSegmentAsyncDeletionEnabled() {
+    return getProperty(ControllerPeriodicTasksConf.ENABLE_TMP_SEGMENT_ASYNC_DELETION, false);
+  }
+
   public int getDeepStoreRetryUploadTimeoutMs() {
     return getProperty(ControllerPeriodicTasksConf.DEEP_STORE_RETRY_UPLOAD_TIMEOUT_MS, -1);
   }
 
+  public int getTmpSegmentRetentionInSeconds() {
+    return getProperty(ControllerPeriodicTasksConf.SPLIT_COMMIT_TMP_SEGMENT_LIFETIME_SECOND,

Review Comment:
   the default value is 600s. Don't want to make it too short or skip it. Because it's likely that the current consuming segment is being committed by the lead server and the lead server will generate an intermediate tmp segment, which is a valid case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1326763001


##########
pinot-common/src/main/java/org/apache/pinot/common/metrics/ValidationMetrics.java:
##########
@@ -239,6 +239,14 @@ public void cleanupSegmentCountGauge(final String resource) {
     removeGauge(resource, ValidationMetricName.SEGMENT_COUNT);
   }
 
+  public void updateTmpSegmentCountGauge(final String resource, final long tmpSegmentCount) {

Review Comment:
   Appreciate for sharing the knowledge. I have replace gauge with meter and add removeTableMeter method into AbstractMetrics.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: [Draft] Periodical Tmp Segment file deletion

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1218414798


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/PinotFSSegmentUploader.java:
##########
@@ -74,7 +73,7 @@ public URI uploadSegment(File segmentFile, LLCSegmentName segmentName, int timeo
     final String rawTableName = TableNameBuilder.extractRawTableName(segmentName.getTableName());
     Callable<URI> uploadTask = () -> {
       URI destUri = new URI(StringUtil.join(File.separator, _segmentStoreUriStr, segmentName.getTableName(),
-          segmentName.getSegmentName() + UUID.randomUUID().toString()));
+          SegmentCompletionUtils.generateSegmentFileName(segmentName.getSegmentName())));

Review Comment:
   Server2ControllerSegmentUploader and PinotFSSegmentUploader has different naming pattern for tmp segments before.
   Server2ControllerSegmentUploader: segment_name.tmp.<UUID>
   PinotFSSegmentUploader: segment_name.<UUID>



##########
pinot-controller/src/main/java/org/apache/pinot/controller/validation/RealtimeSegmentValidationManager.java:
##########
@@ -130,6 +130,13 @@ private void runSegmentLevelValidation(TableConfig tableConfig, PartitionLevelSt
         && _llcRealtimeSegmentManager.isDeepStoreLLCSegmentUploadRetryEnabled()) {
       _llcRealtimeSegmentManager.uploadToDeepStoreIfMissing(tableConfig, segmentsZKMetadata);
     }
+
+    // Delete tmp segments
+    if (streamConfig.hasLowLevelConsumerType()
+        && _llcRealtimeSegmentManager.getIsSplitCommitEnabled()
+        && _llcRealtimeSegmentManager.isTmpSegmentAsyncDeletionEnabled()) {
+      _llcRealtimeSegmentManager.deleteTmpSegments(tableConfig, segmentsZKMetadata);

Review Comment:
   I am hesitating whether to have a separate class for async tmp segments deletion. Pros: the segment re-upload job and async tmp segment deletion can be scheduled at different pace. Cons: If the segment re-upload job failed in the middle, the new tmp segments cannot be cleared up in time



##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/SegmentCompletionUtils.java:
##########
@@ -30,6 +31,8 @@ private SegmentCompletionUtils() {
   private static final Logger LOGGER = LoggerFactory.getLogger(SegmentCompletionUtils.class);
   // Used to create temporary segment file names
   private static final String TMP = ".tmp.";
+  private static final Pattern TMP_FILE = Pattern.compile(
+      "\\.tmp\\.[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$");

Review Comment:
   maybe `[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$"` so that old temp segments created by server also be cleared up?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on PR #10815:
URL: https://github.com/apache/pinot/pull/10815#issuecomment-1700497805

   Have an manual integration test with Uber internal pinotFS and the log shows that the tmp file was deleted successfully.
   
   ```
   Succeed to delete file: path/to/<table>/<segment>.tmp.<UUID> 
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1262806830


##########
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java:
##########
@@ -215,9 +219,14 @@ public static class ControllerPeriodicTasksConf {
         "controller.realtime.segment.deepStoreUploadRetryEnabled";
     public static final String DEEP_STORE_RETRY_UPLOAD_TIMEOUT_MS =
         "controller.realtime.segment.deepStoreUploadRetry.timeoutMs";
+    public static final String ENABLE_TMP_SEGMENT_ASYNC_DELETION =
+        "controller.realtime.split.commit.segment.tmp.cleanup.async.enable";

Review Comment:
   Actually, the skip logic is inside the task. Since it's a little bit complicated, I wanted to encapsulate the logic inside task instead of expose it on starter.
   
   ```
       if (!isLowLevelConsumer(tableNameWithType, tableConfig)
           || !getIsSplitCommitEnabled()
           || !isTmpSegmentAsyncDeletionEnabled()) {
         return 0L;
       }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1324102773


##########
pinot-common/src/main/java/org/apache/pinot/common/metrics/ValidationMetrics.java:
##########
@@ -239,6 +239,14 @@ public void cleanupSegmentCountGauge(final String resource) {
     removeGauge(resource, ValidationMetricName.SEGMENT_COUNT);
   }
 
+  public void updateTmpSegmentCountGauge(final String resource, final long tmpSegmentCount) {

Review Comment:
   Gauge means refreshing the value, so that can be used to for total docs, total segments etc, but not for incremental values such as segments deleted in this round



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1299509603


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/SegmentCompletionUtils.java:
##########
@@ -44,4 +47,8 @@ public static String getSegmentNamePrefix(String segmentName) {
   public static String generateSegmentFileName(String segmentNameStr) {
     return getSegmentNamePrefix(segmentNameStr) + UUID.randomUUID().toString();
   }
+
+  public static boolean isTmpFile(String uri) {
+    return TMP_FILE.matcher(uri).find();

Review Comment:
   [0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12} is the UUID regex. But I understand your concern on the reliability and wanted to reply on the official library to parse the UUID. Will adopt your suggestion



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1299513809


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1465,6 +1469,65 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe
     }
   }
 
+  public long deleteTmpSegments(String tableNameWithType) {
+    Preconditions.checkState(!_isStopping, "Segment manager is stopping");
+
+    if (!TableNameBuilder.isRealtimeTableResource(tableNameWithType)) {
+      return 0L;
+    }
+
+    TableConfig tableConfig = _helixResourceManager.getTableConfig(tableNameWithType);
+    if (tableConfig == null) {
+      LOGGER.warn("Failed to find table config for table: {}, skipping deletion of tmp segments", tableNameWithType);
+      return 0L;
+    }
+
+    if (!isLowLevelConsumer(tableNameWithType, tableConfig)
+        || !getIsSplitCommitEnabled()
+        || !isTmpSegmentAsyncDeletionEnabled()) {
+      return 0L;
+    }
+
+    // Delete tmp segments for realtime table with low level consumer, split commit and async deletion is enabled.
+    List<SegmentZKMetadata> segmentsZKMetadata = _helixResourceManager.getSegmentsZKMetadata(tableNameWithType);
+    Set<String> deepURIs = segmentsZKMetadata.stream().parallel().filter(meta -> meta.getStatus() == Status.DONE
+        && !CommonConstants.Segment.METADATA_URI_FOR_PEER_DOWNLOAD.equals(meta.getDownloadUrl())).map(
+        SegmentZKMetadata::getDownloadUrl).collect(
+        Collectors.toSet());
+
+    String rawTableName = TableNameBuilder.extractRawTableName(tableNameWithType);
+    URI tableDirURI = URIUtils.getUri(_controllerConf.getDataDir(), rawTableName);
+    PinotFS pinotFS = PinotFSFactory.create(tableDirURI.getScheme());
+    long orphanTmpSegments = 0;
+    try {
+      for (String filePath : pinotFS.listFiles(tableDirURI, false)) {
+        // prepend scheme
+        URI uri = URIUtils.getUri(filePath);
+        if (isTmpAndCanDelete(uri, deepURIs, pinotFS)) {
+          LOGGER.info("Deleting temporary segment file: {}", uri);
+          Preconditions.checkState(pinotFS.delete(uri, true), "Failed to delete file: %s", uri);
+          orphanTmpSegments++;
+        }
+      }
+    } catch (Exception e) {
+      LOGGER.warn("Caught exception while deleting temporary files for table: {}", rawTableName, e);
+    }
+    return orphanTmpSegments;
+  }
+
+  private boolean isTmpAndCanDelete(URI uri, Set<String> deepURIs, PinotFS pinotFS) throws Exception {
+    long lastModified = pinotFS.lastModified(uri);
+    String uriString = uri.toString();

Review Comment:
   the getUri will prepend a scheme if the original filepath doesn't have a scheme.
   
   ```
     public static URI getUri(String path) {
       try {
         URI uri = new URI(path);
         if (uri.getScheme() != null) {
           return uri;
         } else {
           return new URI(CommonConstants.Segment.LOCAL_SEGMENT_SCHEME + ":" + path);
         }
       } catch (URISyntaxException e) {
         throw new IllegalArgumentException("Illegal URI path: " + path, e);
       }
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1299527276


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1465,6 +1469,65 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe
     }
   }
 
+  public long deleteTmpSegments(String tableNameWithType) {
+    Preconditions.checkState(!_isStopping, "Segment manager is stopping");
+
+    if (!TableNameBuilder.isRealtimeTableResource(tableNameWithType)) {
+      return 0L;
+    }
+
+    TableConfig tableConfig = _helixResourceManager.getTableConfig(tableNameWithType);
+    if (tableConfig == null) {
+      LOGGER.warn("Failed to find table config for table: {}, skipping deletion of tmp segments", tableNameWithType);
+      return 0L;
+    }
+
+    if (!isLowLevelConsumer(tableNameWithType, tableConfig)
+        || !getIsSplitCommitEnabled()
+        || !isTmpSegmentAsyncDeletionEnabled()) {
+      return 0L;
+    }
+
+    // Delete tmp segments for realtime table with low level consumer, split commit and async deletion is enabled.
+    List<SegmentZKMetadata> segmentsZKMetadata = _helixResourceManager.getSegmentsZKMetadata(tableNameWithType);
+    Set<String> deepURIs = segmentsZKMetadata.stream().parallel().filter(meta -> meta.getStatus() == Status.DONE
+        && !CommonConstants.Segment.METADATA_URI_FOR_PEER_DOWNLOAD.equals(meta.getDownloadUrl())).map(
+        SegmentZKMetadata::getDownloadUrl).collect(
+        Collectors.toSet());
+
+    String rawTableName = TableNameBuilder.extractRawTableName(tableNameWithType);
+    URI tableDirURI = URIUtils.getUri(_controllerConf.getDataDir(), rawTableName);
+    PinotFS pinotFS = PinotFSFactory.create(tableDirURI.getScheme());
+    long orphanTmpSegments = 0;
+    try {
+      for (String filePath : pinotFS.listFiles(tableDirURI, false)) {
+        // prepend scheme
+        URI uri = URIUtils.getUri(filePath);
+        if (isTmpAndCanDelete(uri, deepURIs, pinotFS)) {
+          LOGGER.info("Deleting temporary segment file: {}", uri);
+          Preconditions.checkState(pinotFS.delete(uri, true), "Failed to delete file: %s", uri);
+          orphanTmpSegments++;
+        }
+      }
+    } catch (Exception e) {
+      LOGGER.warn("Caught exception while deleting temporary files for table: {}", rawTableName, e);
+    }
+    return orphanTmpSegments;
+  }
+
+  private boolean isTmpAndCanDelete(URI uri, Set<String> deepURIs, PinotFS pinotFS) throws Exception {
+    long lastModified = pinotFS.lastModified(uri);
+    String uriString = uri.toString();
+    return SegmentCompletionUtils.isTmpFile(uriString) && !deepURIs.contains(uriString)

Review Comment:
   Actually, at the beginning of RealtimeSegmentValidationManager.runSegmentLevelValidation(), the zk meta is dumped already, so I will pass the zk into the deleteTmpSegment function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] ankitsultana commented on pull request #10815: [Draft] Periodical Tmp Segment file deletion

Posted by "ankitsultana (via GitHub)" <gi...@apache.org>.
ankitsultana commented on PR #10815:
URL: https://github.com/apache/pinot/pull/10815#issuecomment-1579462537

   @wirybeaver Let's follow-up offline


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1295490417


##########
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java:
##########
@@ -215,9 +219,14 @@ public static class ControllerPeriodicTasksConf {
         "controller.realtime.segment.deepStoreUploadRetryEnabled";
     public static final String DEEP_STORE_RETRY_UPLOAD_TIMEOUT_MS =
         "controller.realtime.segment.deepStoreUploadRetry.timeoutMs";
+    public static final String ENABLE_TMP_SEGMENT_ASYNC_DELETION =
+        "controller.realtime.split.commit.segment.tmp.cleanup.async.enable";

Review Comment:
   deleting tmp segments will be invoked with the existing realtime segment validation manager



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1320446625


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1465,6 +1469,72 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe
     }
   }
 
+  public long deleteTmpSegments(String tableNameWithType, List<SegmentZKMetadata> segmentsZKMetadata) {
+    Preconditions.checkState(!_isStopping, "Segment manager is stopping");
+
+    if (!TableNameBuilder.isRealtimeTableResource(tableNameWithType)) {
+      return 0L;
+    }
+
+    TableConfig tableConfig = _helixResourceManager.getTableConfig(tableNameWithType);
+    if (tableConfig == null) {

Review Comment:
   to clean up orphan segments for deleted table, should have a different job since not only .tmp. but other files need to be deleted.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] chenboat commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "chenboat (via GitHub)" <gi...@apache.org>.
chenboat commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1320442196


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1465,6 +1469,72 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe
     }
   }
 
+  public long deleteTmpSegments(String tableNameWithType, List<SegmentZKMetadata> segmentsZKMetadata) {
+    Preconditions.checkState(!_isStopping, "Segment manager is stopping");
+
+    if (!TableNameBuilder.isRealtimeTableResource(tableNameWithType)) {
+      return 0L;
+    }
+
+    TableConfig tableConfig = _helixResourceManager.getTableConfig(tableNameWithType);
+    if (tableConfig == null) {

Review Comment:
   If table config can not be found, should we delete the tmp segments straight away? why skip?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] chenboat commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "chenboat (via GitHub)" <gi...@apache.org>.
chenboat commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1320444535


##########
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java:
##########
@@ -215,9 +215,14 @@ public static class ControllerPeriodicTasksConf {
         "controller.realtime.segment.deepStoreUploadRetryEnabled";
     public static final String DEEP_STORE_RETRY_UPLOAD_TIMEOUT_MS =
         "controller.realtime.segment.deepStoreUploadRetry.timeoutMs";
+    public static final String ENABLE_TMP_SEGMENT_ASYNC_DELETION =
+        "controller.realtime.split.commit.segment.tmp.cleanup.async.enable";
+    public static final String SPLIT_COMMIT_TMP_SEGMENT_LIFETIME_SECOND =

Review Comment:
   add comment? Explain what LIFETIME_SECOND means.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] chenboat commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "chenboat (via GitHub)" <gi...@apache.org>.
chenboat commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1320443889


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/SegmentCompletionUtils.java:
##########
@@ -42,6 +43,19 @@ public static String getSegmentNamePrefix(String segmentName) {
   }
 
   public static String generateSegmentFileName(String segmentNameStr) {

Review Comment:
   should we call it generateTempSegmentFileName instead since the file name is for a tmp one?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1306863660


##########
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java:
##########
@@ -215,9 +215,14 @@ public static class ControllerPeriodicTasksConf {
         "controller.realtime.segment.deepStoreUploadRetryEnabled";
     public static final String DEEP_STORE_RETRY_UPLOAD_TIMEOUT_MS =
         "controller.realtime.segment.deepStoreUploadRetry.timeoutMs";
+    public static final String ENABLE_TMP_SEGMENT_ASYNC_DELETION =
+        "controller.realtime.split.commit.segment.tmp.cleanup.async.enable";
+    public static final String SPLIT_COMMIT_TMP_SEGMENT_LIFETIME_SECOND =
+        "controller.realtime.split.commit.segment.tmp.lifetime.second";
 
     public static final int MIN_INITIAL_DELAY_IN_SECONDS = 120;
     public static final int MAX_INITIAL_DELAY_IN_SECONDS = 300;
+    public static final int DEFAULT_SPLIT_COMMIT_TMP_SEGMENT_LIFETIME_SECOND = 600;

Review Comment:
   The .tmp.<UUID> file is generated during segment commit phase. If there's any timeout during the remote file system connection + ZK update connection, then this .tmp.<UUID> won't be use anymore. The 10min comes from 5 min timeout in file system connection plus 5 min timeout in zk connection.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on PR #10815:
URL: https://github.com/apache/pinot/pull/10815#issuecomment-1635074752

   > Some high level questions:
   > 
   > 1. Should we always rely on the async cleanup and remove the sync cleanup (assuming listing files is expensive)? I can see we can integrate more clean up logic into this periodic task in the future. Also, let's make the periodic task name more generic for future prove
   > 2. Do we create temp file for non-split-commit scenario?
   
   
   For question 1, I already ensured the sync cleanup will be skipped if async deletion is enabled.
   
   ```
       if (!isTmpSegmentAsyncDeletionEnabled()) 
         try {
           for (String uri : pinotFS.listFiles(tableDirURI, false)) {
             if (uri.contains(SegmentCompletionUtils.getSegmentNamePrefix(segmentName))) {
               LOGGER.warn("Deleting temporary segment file: {}", uri);
               Preconditions.checkState(pinotFS.delete(new URI(uri), true), "Failed to delete file: %s", uri);
   ```
   
   Rename it as TmpSegmentCleaner?
   
   For question 2, I need to inspect the source code.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on pull request #10815: [Draft] Periodical Tmp Segment file deletion

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on PR #10815:
URL: https://github.com/apache/pinot/pull/10815#issuecomment-1612198919

   Do we need to clean up tmp files periodically, or just when the controller starts? Will controller create extra tmp files during the regular life cycle without shut down?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1326763001


##########
pinot-common/src/main/java/org/apache/pinot/common/metrics/ValidationMetrics.java:
##########
@@ -239,6 +239,14 @@ public void cleanupSegmentCountGauge(final String resource) {
     removeGauge(resource, ValidationMetricName.SEGMENT_COUNT);
   }
 
+  public void updateTmpSegmentCountGauge(final String resource, final long tmpSegmentCount) {

Review Comment:
   Appreciate for sharing the knowledge. I already replaced gauge with meter and add edremoveTableMeter method into AbstractMetrics.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] ankitsultana commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "ankitsultana (via GitHub)" <gi...@apache.org>.
ankitsultana commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1318160656


##########
pinot-controller/src/main/java/org/apache/pinot/controller/validation/RealtimeSegmentValidationManager.java:
##########
@@ -117,8 +117,18 @@ protected void processTable(String tableNameWithType, Context context) {
 
   private void runSegmentLevelValidation(TableConfig tableConfig, PartitionLevelStreamConfig streamConfig) {
     String realtimeTableName = tableConfig.getTableName();
+
     List<SegmentZKMetadata> segmentsZKMetadata = _pinotHelixResourceManager.getSegmentsZKMetadata(realtimeTableName);
 
+    // Delete tmp segments
+    try {
+      long numDeleteTmpSegments = _llcRealtimeSegmentManager.deleteTmpSegments(realtimeTableName, segmentsZKMetadata);

Review Comment:
   nit: one alternative for this may be to check `isTmpSegmentAsyncDeletionEnabled` outside `deleteTmpSegments` and avoid calling `deleteTmpSegments` altogether if the feature is not enabled. Also `deleteTmpSegments` could add a `Precondition(isTmpSegmentAsyncDeletionEnabled(), "called even though not enabled")`. Though this also works so feel free to ignore.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1295489950


##########
pinot-controller/src/main/java/org/apache/pinot/controller/BaseControllerStarter.java:
##########
@@ -706,6 +710,8 @@ protected List<PeriodicTask> setupControllerPeriodicTasks() {
         new TaskMetricsEmitter(_helixResourceManager, _helixTaskResourceManager, _leadControllerManager, _config,
             _controllerMetrics);
     periodicTasks.add(_taskMetricsEmitter);
+    _realtimeTempSegmentCleaner = new RealtimeTempSegmentCleaner(_config, _helixResourceManager, _leadControllerManager,
+        _pinotLLCRealtimeSegmentManager, _validationMetrics, _controllerMetrics);

Review Comment:
   deleting tmp segments will be invoked with the existing realtime segment validation manager



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1299509603


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/SegmentCompletionUtils.java:
##########
@@ -44,4 +47,8 @@ public static String getSegmentNamePrefix(String segmentName) {
   public static String generateSegmentFileName(String segmentNameStr) {
     return getSegmentNamePrefix(segmentNameStr) + UUID.randomUUID().toString();
   }
+
+  public static boolean isTmpFile(String uri) {
+    return TMP_FILE.matcher(uri).find();

Review Comment:
   Actually I was thinking out the edge case that original full file path contains `.tmp.`. For example, the original file path `hdfs://path/to/XXX.tmp.YYY/segment_from_to`, the final tmp file path `hdfs://path/to/XXX.tmp.YYY/segment_from_to.tmp.UUID`. The split will false negatively reject customers contain .tmp. in their original file path.
   
   ".+\\.tmp\\.[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$" can ensure customers original file path is not empty



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] chenboat commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "chenboat (via GitHub)" <gi...@apache.org>.
chenboat commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1320446464


##########
pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/LLCSegmentCompletionHandlers.java:
##########
@@ -47,10 +47,10 @@
 import org.apache.pinot.controller.api.access.Authenticate;
 import org.apache.pinot.controller.helix.core.realtime.SegmentCompletionManager;
 import org.apache.pinot.controller.helix.core.realtime.segment.CommittingSegmentDescriptor;
-import org.apache.pinot.controller.util.SegmentCompletionUtils;
 import org.apache.pinot.core.auth.Actions;
 import org.apache.pinot.core.auth.Authorize;
 import org.apache.pinot.core.auth.TargetType;
+import org.apache.pinot.core.data.manager.realtime.SegmentCompletionUtils;

Review Comment:
   Can you check if this import is needed? No other change in this file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1323785121


##########
pinot-common/src/main/java/org/apache/pinot/common/metrics/ValidationMetrics.java:
##########
@@ -239,6 +239,14 @@ public void cleanupSegmentCountGauge(final String resource) {
     removeGauge(resource, ValidationMetricName.SEGMENT_COUNT);
   }
 
+  public void updateTmpSegmentCountGauge(final String resource, final long tmpSegmentCount) {

Review Comment:
   ```  
   public enum ValidationMetricName {
       MISSING_SEGMENT_COUNT("missingSegmentCount"),
       OFFLINE_SEGMENT_DELAY_HOURS("offlineSegmentDelayHours"),
       LAST_PUSH_TIME_DELAY_HOURS("lastPushTimeDelayHours"),
       TOTAL_DOCUMENT_COUNT("TotalDocumentCount"),
       NON_CONSUMING_PARTITION_COUNT("NonConsumingPartitionCount"),
       SEGMENT_COUNT("SegmentCount"),
       DELETED_TMP_SEGMENT_COUNT("DeletedTmpSegmentCount");
   ```
   It seems that all count metrics are using Gauge



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1295492501


##########
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java:
##########
@@ -921,10 +953,19 @@ public boolean isDeepStoreRetryUploadLLCSegmentEnabled() {
     return getProperty(ControllerPeriodicTasksConf.ENABLE_DEEP_STORE_RETRY_UPLOAD_LLC_SEGMENT, false);
   }
 
+  public boolean isTmpSegmentAsyncDeletionEnabled() {
+    return getProperty(ControllerPeriodicTasksConf.ENABLE_TMP_SEGMENT_ASYNC_DELETION, false);
+  }
+
   public int getDeepStoreRetryUploadTimeoutMs() {
     return getProperty(ControllerPeriodicTasksConf.DEEP_STORE_RETRY_UPLOAD_TIMEOUT_MS, -1);
   }
 
+  public int getTmpSegmentRetentionInSeconds() {
+    return getProperty(ControllerPeriodicTasksConf.SPLIT_COMMIT_TMP_SEGMENT_LIFETIME_SECOND,

Review Comment:
   the default value is 600s. Don't want to make it too short, because it's likely that the current consuming segment is being committed by the lead server and that intermediate tmp segment is valid



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1299509603


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/SegmentCompletionUtils.java:
##########
@@ -44,4 +47,8 @@ public static String getSegmentNamePrefix(String segmentName) {
   public static String generateSegmentFileName(String segmentNameStr) {
     return getSegmentNamePrefix(segmentNameStr) + UUID.randomUUID().toString();
   }
+
+  public static boolean isTmpFile(String uri) {
+    return TMP_FILE.matcher(uri).find();

Review Comment:
   Actually I was thinking out the edge case that original full file path contains `.tmp.`. For example, the original file path `hdfs://path/to/XXX.tmp.YYY/segment_from_to`, the final tmp file path `hdfs://path/to/XXX.tmp.YYY/segment_from_to.tmp.UUID`. The split will false negatively reject customers contain .tmp. in their original file path.
   
   `".+\\.tmp\\.[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$"` can ensure customers original file path is not empty



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1299499174


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/SegmentCompletionUtils.java:
##########
@@ -44,4 +47,8 @@ public static String getSegmentNamePrefix(String segmentName) {
   public static String generateSegmentFileName(String segmentNameStr) {
     return getSegmentNamePrefix(segmentNameStr) + UUID.randomUUID().toString();
   }
+
+  public static boolean isTmpFile(String uri) {
+    return TMP_FILE.matcher(uri).find();

Review Comment:
   Based on how the name is generated, should we consider splitting the name with `TMP` and then try to parse the second part as UUID?
   ```suggestion
       String[] splits = StringUtils.splitByWholeSeparator(uri, TMP);
       if (splits.length != 2) {
         return false;
       }
       try {
         UUID.fromString(splits[1]);
         return true;
       } catch (IllegalArgumentException e) {
         return false;
       }
   ```



##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1465,6 +1469,65 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe
     }
   }
 
+  public long deleteTmpSegments(String tableNameWithType) {
+    Preconditions.checkState(!_isStopping, "Segment manager is stopping");
+
+    if (!TableNameBuilder.isRealtimeTableResource(tableNameWithType)) {
+      return 0L;
+    }
+
+    TableConfig tableConfig = _helixResourceManager.getTableConfig(tableNameWithType);
+    if (tableConfig == null) {
+      LOGGER.warn("Failed to find table config for table: {}, skipping deletion of tmp segments", tableNameWithType);
+      return 0L;
+    }
+
+    if (!isLowLevelConsumer(tableNameWithType, tableConfig)
+        || !getIsSplitCommitEnabled()
+        || !isTmpSegmentAsyncDeletionEnabled()) {
+      return 0L;
+    }

Review Comment:
   These checks are redundant



##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1465,6 +1469,65 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe
     }
   }
 
+  public long deleteTmpSegments(String tableNameWithType) {
+    Preconditions.checkState(!_isStopping, "Segment manager is stopping");
+
+    if (!TableNameBuilder.isRealtimeTableResource(tableNameWithType)) {
+      return 0L;
+    }
+
+    TableConfig tableConfig = _helixResourceManager.getTableConfig(tableNameWithType);
+    if (tableConfig == null) {
+      LOGGER.warn("Failed to find table config for table: {}, skipping deletion of tmp segments", tableNameWithType);
+      return 0L;
+    }
+
+    if (!isLowLevelConsumer(tableNameWithType, tableConfig)
+        || !getIsSplitCommitEnabled()
+        || !isTmpSegmentAsyncDeletionEnabled()) {
+      return 0L;
+    }
+
+    // Delete tmp segments for realtime table with low level consumer, split commit and async deletion is enabled.
+    List<SegmentZKMetadata> segmentsZKMetadata = _helixResourceManager.getSegmentsZKMetadata(tableNameWithType);
+    Set<String> deepURIs = segmentsZKMetadata.stream().parallel().filter(meta -> meta.getStatus() == Status.DONE
+        && !CommonConstants.Segment.METADATA_URI_FOR_PEER_DOWNLOAD.equals(meta.getDownloadUrl())).map(
+        SegmentZKMetadata::getDownloadUrl).collect(
+        Collectors.toSet());
+
+    String rawTableName = TableNameBuilder.extractRawTableName(tableNameWithType);
+    URI tableDirURI = URIUtils.getUri(_controllerConf.getDataDir(), rawTableName);
+    PinotFS pinotFS = PinotFSFactory.create(tableDirURI.getScheme());
+    long orphanTmpSegments = 0;
+    try {
+      for (String filePath : pinotFS.listFiles(tableDirURI, false)) {
+        // prepend scheme
+        URI uri = URIUtils.getUri(filePath);
+        if (isTmpAndCanDelete(uri, deepURIs, pinotFS)) {
+          LOGGER.info("Deleting temporary segment file: {}", uri);
+          Preconditions.checkState(pinotFS.delete(uri, true), "Failed to delete file: %s", uri);
+          orphanTmpSegments++;
+        }
+      }
+    } catch (Exception e) {
+      LOGGER.warn("Caught exception while deleting temporary files for table: {}", rawTableName, e);
+    }
+    return orphanTmpSegments;
+  }
+
+  private boolean isTmpAndCanDelete(URI uri, Set<String> deepURIs, PinotFS pinotFS) throws Exception {
+    long lastModified = pinotFS.lastModified(uri);
+    String uriString = uri.toString();
+    return SegmentCompletionUtils.isTmpFile(uriString) && !deepURIs.contains(uriString)

Review Comment:
   To reduce unnecessary FS access, first check the name, then read the modified time.
   I don't think we need to check if it is used as download uri because it should never happen



##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1465,6 +1469,65 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe
     }
   }
 
+  public long deleteTmpSegments(String tableNameWithType) {
+    Preconditions.checkState(!_isStopping, "Segment manager is stopping");
+
+    if (!TableNameBuilder.isRealtimeTableResource(tableNameWithType)) {
+      return 0L;
+    }
+
+    TableConfig tableConfig = _helixResourceManager.getTableConfig(tableNameWithType);
+    if (tableConfig == null) {
+      LOGGER.warn("Failed to find table config for table: {}, skipping deletion of tmp segments", tableNameWithType);
+      return 0L;
+    }
+
+    if (!isLowLevelConsumer(tableNameWithType, tableConfig)
+        || !getIsSplitCommitEnabled()
+        || !isTmpSegmentAsyncDeletionEnabled()) {
+      return 0L;
+    }
+
+    // Delete tmp segments for realtime table with low level consumer, split commit and async deletion is enabled.
+    List<SegmentZKMetadata> segmentsZKMetadata = _helixResourceManager.getSegmentsZKMetadata(tableNameWithType);
+    Set<String> deepURIs = segmentsZKMetadata.stream().parallel().filter(meta -> meta.getStatus() == Status.DONE
+        && !CommonConstants.Segment.METADATA_URI_FOR_PEER_DOWNLOAD.equals(meta.getDownloadUrl())).map(
+        SegmentZKMetadata::getDownloadUrl).collect(
+        Collectors.toSet());
+
+    String rawTableName = TableNameBuilder.extractRawTableName(tableNameWithType);
+    URI tableDirURI = URIUtils.getUri(_controllerConf.getDataDir(), rawTableName);
+    PinotFS pinotFS = PinotFSFactory.create(tableDirURI.getScheme());
+    long orphanTmpSegments = 0;
+    try {
+      for (String filePath : pinotFS.listFiles(tableDirURI, false)) {
+        // prepend scheme
+        URI uri = URIUtils.getUri(filePath);
+        if (isTmpAndCanDelete(uri, deepURIs, pinotFS)) {
+          LOGGER.info("Deleting temporary segment file: {}", uri);
+          Preconditions.checkState(pinotFS.delete(uri, true), "Failed to delete file: %s", uri);
+          orphanTmpSegments++;
+        }
+      }
+    } catch (Exception e) {
+      LOGGER.warn("Caught exception while deleting temporary files for table: {}", rawTableName, e);
+    }
+    return orphanTmpSegments;
+  }
+
+  private boolean isTmpAndCanDelete(URI uri, Set<String> deepURIs, PinotFS pinotFS) throws Exception {
+    long lastModified = pinotFS.lastModified(uri);
+    String uriString = uri.toString();

Review Comment:
   We are converting string to uri then back, can we directly pass in string?



##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1465,6 +1469,65 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe
     }
   }
 
+  public long deleteTmpSegments(String tableNameWithType) {
+    Preconditions.checkState(!_isStopping, "Segment manager is stopping");
+
+    if (!TableNameBuilder.isRealtimeTableResource(tableNameWithType)) {
+      return 0L;
+    }
+
+    TableConfig tableConfig = _helixResourceManager.getTableConfig(tableNameWithType);
+    if (tableConfig == null) {
+      LOGGER.warn("Failed to find table config for table: {}, skipping deletion of tmp segments", tableNameWithType);
+      return 0L;
+    }
+
+    if (!isLowLevelConsumer(tableNameWithType, tableConfig)
+        || !getIsSplitCommitEnabled()
+        || !isTmpSegmentAsyncDeletionEnabled()) {
+      return 0L;
+    }
+
+    // Delete tmp segments for realtime table with low level consumer, split commit and async deletion is enabled.
+    List<SegmentZKMetadata> segmentsZKMetadata = _helixResourceManager.getSegmentsZKMetadata(tableNameWithType);

Review Comment:
   We should avoid reading all segments' ZK metadata because it is very expensive



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1299509603


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/SegmentCompletionUtils.java:
##########
@@ -44,4 +47,8 @@ public static String getSegmentNamePrefix(String segmentName) {
   public static String generateSegmentFileName(String segmentNameStr) {
     return getSegmentNamePrefix(segmentNameStr) + UUID.randomUUID().toString();
   }
+
+  public static boolean isTmpFile(String uri) {
+    return TMP_FILE.matcher(uri).find();

Review Comment:
   Actually I was thinking out the edge case that original full file path contains `.tmp.`. For example, the original file path `hdfs://path/to/XXX.tmp.YYY/segment_from_to`, the final tmp file path `hdfs://path/to/XXX.tmp.YYY/segment_from_to.tmp.UUID`. The split will false negatively reject customers contain .tmp. in their original file path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] ankitsultana commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "ankitsultana (via GitHub)" <gi...@apache.org>.
ankitsultana commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1258504948


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -492,15 +498,17 @@ public void commitSegmentFile(String realtimeTableName, CommittingSegmentDescrip
     // We only clean up tmp segment files in table level dir, so there's no need to list recursively.
     // See LLCSegmentCompletionHandlers.uploadSegment().
     // TODO: move tmp file logic into SegmentCompletionUtils.

Review Comment:
   I think we can remove this TODO now.



##########
pinot-common/src/main/java/org/apache/pinot/common/metrics/ValidationMetrics.java:
##########
@@ -239,6 +239,14 @@ public void cleanupSegmentCountGauge(final String resource) {
     removeGauge(resource, ValidationMetricName.SEGMENT_COUNT);
   }
 
+  public void updateTmpSegCountGauge(final String resource, final long tmpSegmentCount) {

Review Comment:
   Would recommend to reword `Seg` to `Segment` to be consistent with the current code.



##########
pinot-common/src/main/java/org/apache/pinot/common/metrics/ValidationMetrics.java:
##########
@@ -239,6 +239,14 @@ public void cleanupSegmentCountGauge(final String resource) {
     removeGauge(resource, ValidationMetricName.SEGMENT_COUNT);
   }
 
+  public void updateTmpSegCountGauge(final String resource, final long tmpSegmentCount) {
+    makeGauge(resource, ValidationMetricName.DELETED_TMP_SEGMENT_COUNT, _storedValueGaugeFactory, tmpSegmentCount);
+  }
+
+  public void cleanupTmpSegCountGauge(final String resource) {
+    removeGauge(resource, ValidationMetricName.SEGMENT_COUNT);

Review Comment:
   This should be `ValidationMetricName.DELETED_TMP_SEGMENT_COUNT`?



##########
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java:
##########
@@ -215,9 +219,14 @@ public static class ControllerPeriodicTasksConf {
         "controller.realtime.segment.deepStoreUploadRetryEnabled";
     public static final String DEEP_STORE_RETRY_UPLOAD_TIMEOUT_MS =
         "controller.realtime.segment.deepStoreUploadRetry.timeoutMs";
+    public static final String ENABLE_TMP_SEGMENT_ASYNC_DELETION =
+        "controller.realtime.split.commit.segment.tmp.cleanup.async.enable";

Review Comment:
   Based on this config we should either run or skip the `RealtimeTempSegmentCleaner`. I think that's not wired right now.



##########
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java:
##########
@@ -921,10 +953,19 @@ public boolean isDeepStoreRetryUploadLLCSegmentEnabled() {
     return getProperty(ControllerPeriodicTasksConf.ENABLE_DEEP_STORE_RETRY_UPLOAD_LLC_SEGMENT, false);
   }
 
+  public boolean isTmpSegmentAsyncDeletionEnabled() {
+    return getProperty(ControllerPeriodicTasksConf.ENABLE_TMP_SEGMENT_ASYNC_DELETION, false);
+  }
+
   public int getDeepStoreRetryUploadTimeoutMs() {
     return getProperty(ControllerPeriodicTasksConf.DEEP_STORE_RETRY_UPLOAD_TIMEOUT_MS, -1);
   }
 
+  public int getTmpSegmentRetentionInSeconds() {
+    return getProperty(ControllerPeriodicTasksConf.SPLIT_COMMIT_TMP_SEGMENT_LIFETIME_SECOND,

Review Comment:
   Is this config needed? Looks like it's getting used in realtime segment commit flow for split-commit tables only. The number of configs are already quite high so unless it's absolutely needed I'd recommend skipping it.



##########
pinot-controller/src/main/java/org/apache/pinot/controller/validation/RealtimeTempSegmentCleaner.java:
##########
@@ -0,0 +1,70 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.controller.validation;
+
+import java.util.List;
+import org.apache.pinot.common.metrics.ControllerMetrics;
+import org.apache.pinot.common.metrics.ValidationMetrics;
+import org.apache.pinot.controller.ControllerConf;
+import org.apache.pinot.controller.LeadControllerManager;
+import org.apache.pinot.controller.helix.core.PinotHelixResourceManager;
+import org.apache.pinot.controller.helix.core.periodictask.ControllerPeriodicTask;
+import org.apache.pinot.controller.helix.core.realtime.PinotLLCRealtimeSegmentManager;
+import org.apache.pinot.spi.utils.builder.TableNameBuilder;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+public class RealtimeTempSegmentCleaner extends ControllerPeriodicTask<Void> {

Review Comment:
   Let's call it tmp everywhere for consistency



##########
pinot-controller/src/main/java/org/apache/pinot/controller/BaseControllerStarter.java:
##########
@@ -706,6 +710,8 @@ protected List<PeriodicTask> setupControllerPeriodicTasks() {
         new TaskMetricsEmitter(_helixResourceManager, _helixTaskResourceManager, _leadControllerManager, _config,
             _controllerMetrics);
     periodicTasks.add(_taskMetricsEmitter);
+    _realtimeTempSegmentCleaner = new RealtimeTempSegmentCleaner(_config, _helixResourceManager, _leadControllerManager,
+        _pinotLLCRealtimeSegmentManager, _validationMetrics, _controllerMetrics);

Review Comment:
   This is not added to `periodicTasks`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1320654384


##########
pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/LLCSegmentCompletionHandlers.java:
##########
@@ -47,10 +47,10 @@
 import org.apache.pinot.controller.api.access.Authenticate;
 import org.apache.pinot.controller.helix.core.realtime.SegmentCompletionManager;
 import org.apache.pinot.controller.helix.core.realtime.segment.CommittingSegmentDescriptor;
-import org.apache.pinot.controller.util.SegmentCompletionUtils;
 import org.apache.pinot.core.auth.Actions;
 import org.apache.pinot.core.auth.Authorize;
 import org.apache.pinot.core.auth.TargetType;
+import org.apache.pinot.core.data.manager.realtime.SegmentCompletionUtils;

Review Comment:
   I move the SegmentCompletionUtils from pinot-controller to pinot-core module, thus need to import here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] chenboat commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "chenboat (via GitHub)" <gi...@apache.org>.
chenboat commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1320446201


##########
pinot-controller/src/main/java/org/apache/pinot/controller/validation/RealtimeSegmentValidationManager.java:
##########
@@ -117,8 +117,18 @@ protected void processTable(String tableNameWithType, Context context) {
 
   private void runSegmentLevelValidation(TableConfig tableConfig, PartitionLevelStreamConfig streamConfig) {
     String realtimeTableName = tableConfig.getTableName();
+
     List<SegmentZKMetadata> segmentsZKMetadata = _pinotHelixResourceManager.getSegmentsZKMetadata(realtimeTableName);
 
+    // Delete tmp segments

Review Comment:
   Why put deletion in the segmentLevelValidation? tmp segments deletion is at table level. Sounds like a misfit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1323793165


##########
pinot-common/src/main/java/org/apache/pinot/common/metrics/ValidationMetrics.java:
##########
@@ -239,6 +239,14 @@ public void cleanupSegmentCountGauge(final String resource) {
     removeGauge(resource, ValidationMetricName.SEGMENT_COUNT);
   }
 
+  public void updateTmpSegmentCountGauge(final String resource, final long tmpSegmentCount) {

Review Comment:
   Shall we use Count?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1320447063


##########
pinot-controller/src/main/java/org/apache/pinot/controller/validation/RealtimeSegmentValidationManager.java:
##########
@@ -117,8 +117,18 @@ protected void processTable(String tableNameWithType, Context context) {
 
   private void runSegmentLevelValidation(TableConfig tableConfig, PartitionLevelStreamConfig streamConfig) {
     String realtimeTableName = tableConfig.getTableName();
+
     List<SegmentZKMetadata> segmentsZKMetadata = _pinotHelixResourceManager.getSegmentsZKMetadata(realtimeTableName);
 
+    // Delete tmp segments

Review Comment:
   It's Jackie's suggestion to avoid potential data race issue if we separate them in different jobs. Check the conversation history.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1320446170


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1465,6 +1469,72 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe
     }
   }
 
+  public long deleteTmpSegments(String tableNameWithType, List<SegmentZKMetadata> segmentsZKMetadata) {
+    Preconditions.checkState(!_isStopping, "Segment manager is stopping");
+
+    if (!TableNameBuilder.isRealtimeTableResource(tableNameWithType)) {
+      return 0L;
+    }
+
+    TableConfig tableConfig = _helixResourceManager.getTableConfig(tableNameWithType);
+    if (tableConfig == null) {

Review Comment:
   if (!isLowLevelConsumer(tableNameWithType, tableConfig) requires non null tableConfig



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] chenboat commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "chenboat (via GitHub)" <gi...@apache.org>.
chenboat commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1320442377


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1465,6 +1469,72 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe
     }
   }
 
+  public long deleteTmpSegments(String tableNameWithType, List<SegmentZKMetadata> segmentsZKMetadata) {

Review Comment:
   Please add javadoc for public methods.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1323752263


##########
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java:
##########
@@ -215,9 +215,15 @@ public static class ControllerPeriodicTasksConf {
         "controller.realtime.segment.deepStoreUploadRetryEnabled";
     public static final String DEEP_STORE_RETRY_UPLOAD_TIMEOUT_MS =
         "controller.realtime.segment.deepStoreUploadRetry.timeoutMs";
+    public static final String ENABLE_TMP_SEGMENT_ASYNC_DELETION =
+        "controller.realtime.split.commit.segment.tmp.cleanup.async.enable";
+    // temporary segments within expiration won't be deleted so that ongoing split commit won't be impacted.
+    public static final String SPLIT_COMMIT_TMP_SEGMENT_LIFETIME_SECONDS =
+        "controller.realtime.split.commit.segment.tmp.lifetime.seconds";

Review Comment:
   Same here
   ```suggestion
       public static final String TMP_SEGMENT_RETENTION_IN_SECONDS =
           "controller.realtime.segment.tmpFileRetentionInSeconds";
   ```



##########
pinot-controller/src/main/java/org/apache/pinot/controller/validation/RealtimeSegmentValidationManager.java:
##########
@@ -117,8 +117,18 @@ protected void processTable(String tableNameWithType, Context context) {
 
   private void runSegmentLevelValidation(TableConfig tableConfig, PartitionLevelStreamConfig streamConfig) {
     String realtimeTableName = tableConfig.getTableName();
+
     List<SegmentZKMetadata> segmentsZKMetadata = _pinotHelixResourceManager.getSegmentsZKMetadata(realtimeTableName);
 
+    // Delete tmp segments
+    try {
+      long numDeleteTmpSegments = _llcRealtimeSegmentManager.deleteTmpSegments(realtimeTableName, segmentsZKMetadata);
+      LOGGER.info("Delete {} tmp segments for table {}", numDeleteTmpSegments, realtimeTableName);
+      _validationMetrics.updateTmpSegmentCountGauge(realtimeTableName, numDeleteTmpSegments);
+    } catch (Exception e) {
+      LOGGER.error(String.format("Fail to delete tmp segments for table %s", realtimeTableName), e);

Review Comment:
   ```suggestion
         LOGGER.error("Failed to delete tmp segments for table: {}", realtimeTableName, e);
   ```



##########
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java:
##########
@@ -215,9 +215,15 @@ public static class ControllerPeriodicTasksConf {
         "controller.realtime.segment.deepStoreUploadRetryEnabled";
     public static final String DEEP_STORE_RETRY_UPLOAD_TIMEOUT_MS =
         "controller.realtime.segment.deepStoreUploadRetry.timeoutMs";
+    public static final String ENABLE_TMP_SEGMENT_ASYNC_DELETION =
+        "controller.realtime.split.commit.segment.tmp.cleanup.async.enable";

Review Comment:
   The tmp file is not generated only for split commit case. Suggest renaming the keep to match other configs
   ```suggestion
           "controller.realtime.segment.tmpFileAsyncDeletionEnabled";
   ```



##########
pinot-common/src/main/java/org/apache/pinot/common/metrics/ValidationMetrics.java:
##########
@@ -239,6 +239,14 @@ public void cleanupSegmentCountGauge(final String resource) {
     removeGauge(resource, ValidationMetricName.SEGMENT_COUNT);
   }
 
+  public void updateTmpSegmentCountGauge(final String resource, final long tmpSegmentCount) {

Review Comment:
   Why do we define it as a gauge? Should it be a meter instead?
   
   Let's match the name of the method and the metric



##########
pinot-controller/src/main/java/org/apache/pinot/controller/validation/RealtimeSegmentValidationManager.java:
##########
@@ -117,8 +117,18 @@ protected void processTable(String tableNameWithType, Context context) {
 
   private void runSegmentLevelValidation(TableConfig tableConfig, PartitionLevelStreamConfig streamConfig) {
     String realtimeTableName = tableConfig.getTableName();
+
     List<SegmentZKMetadata> segmentsZKMetadata = _pinotHelixResourceManager.getSegmentsZKMetadata(realtimeTableName);
 
+    // Delete tmp segments
+    try {
+      long numDeleteTmpSegments = _llcRealtimeSegmentManager.deleteTmpSegments(realtimeTableName, segmentsZKMetadata);
+      LOGGER.info("Delete {} tmp segments for table {}", numDeleteTmpSegments, realtimeTableName);

Review Comment:
   (nit)
   ```suggestion
         LOGGER.info("Deleted {} tmp segments for table: {}", numDeleteTmpSegments, realtimeTableName);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1262807118


##########
pinot-controller/src/main/java/org/apache/pinot/controller/BaseControllerStarter.java:
##########
@@ -706,6 +710,8 @@ protected List<PeriodicTask> setupControllerPeriodicTasks() {
         new TaskMetricsEmitter(_helixResourceManager, _helixTaskResourceManager, _leadControllerManager, _config,
             _controllerMetrics);
     periodicTasks.add(_taskMetricsEmitter);
+    _realtimeTempSegmentCleaner = new RealtimeTempSegmentCleaner(_config, _helixResourceManager, _leadControllerManager,
+        _pinotLLCRealtimeSegmentManager, _validationMetrics, _controllerMetrics);

Review Comment:
   mybad, will fix it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1262806830


##########
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java:
##########
@@ -215,9 +219,14 @@ public static class ControllerPeriodicTasksConf {
         "controller.realtime.segment.deepStoreUploadRetryEnabled";
     public static final String DEEP_STORE_RETRY_UPLOAD_TIMEOUT_MS =
         "controller.realtime.segment.deepStoreUploadRetry.timeoutMs";
+    public static final String ENABLE_TMP_SEGMENT_ASYNC_DELETION =
+        "controller.realtime.split.commit.segment.tmp.cleanup.async.enable";

Review Comment:
   good suggestion



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1299509967


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1465,6 +1469,65 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe
     }
   }
 
+  public long deleteTmpSegments(String tableNameWithType) {
+    Preconditions.checkState(!_isStopping, "Segment manager is stopping");
+
+    if (!TableNameBuilder.isRealtimeTableResource(tableNameWithType)) {
+      return 0L;
+    }
+
+    TableConfig tableConfig = _helixResourceManager.getTableConfig(tableNameWithType);
+    if (tableConfig == null) {
+      LOGGER.warn("Failed to find table config for table: {}, skipping deletion of tmp segments", tableNameWithType);
+      return 0L;
+    }
+
+    if (!isLowLevelConsumer(tableNameWithType, tableConfig)
+        || !getIsSplitCommitEnabled()
+        || !isTmpSegmentAsyncDeletionEnabled()) {
+      return 0L;
+    }
+
+    // Delete tmp segments for realtime table with low level consumer, split commit and async deletion is enabled.
+    List<SegmentZKMetadata> segmentsZKMetadata = _helixResourceManager.getSegmentsZKMetadata(tableNameWithType);
+    Set<String> deepURIs = segmentsZKMetadata.stream().parallel().filter(meta -> meta.getStatus() == Status.DONE
+        && !CommonConstants.Segment.METADATA_URI_FOR_PEER_DOWNLOAD.equals(meta.getDownloadUrl())).map(
+        SegmentZKMetadata::getDownloadUrl).collect(
+        Collectors.toSet());
+
+    String rawTableName = TableNameBuilder.extractRawTableName(tableNameWithType);
+    URI tableDirURI = URIUtils.getUri(_controllerConf.getDataDir(), rawTableName);
+    PinotFS pinotFS = PinotFSFactory.create(tableDirURI.getScheme());
+    long orphanTmpSegments = 0;
+    try {
+      for (String filePath : pinotFS.listFiles(tableDirURI, false)) {
+        // prepend scheme
+        URI uri = URIUtils.getUri(filePath);
+        if (isTmpAndCanDelete(uri, deepURIs, pinotFS)) {
+          LOGGER.info("Deleting temporary segment file: {}", uri);
+          Preconditions.checkState(pinotFS.delete(uri, true), "Failed to delete file: %s", uri);
+          orphanTmpSegments++;
+        }
+      }
+    } catch (Exception e) {
+      LOGGER.warn("Caught exception while deleting temporary files for table: {}", rawTableName, e);
+    }
+    return orphanTmpSegments;
+  }
+
+  private boolean isTmpAndCanDelete(URI uri, Set<String> deepURIs, PinotFS pinotFS) throws Exception {
+    long lastModified = pinotFS.lastModified(uri);
+    String uriString = uri.toString();
+    return SegmentCompletionUtils.isTmpFile(uriString) && !deepURIs.contains(uriString)

Review Comment:
   Some Uber internal engineer backfill the empty download url with tmp files : )



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1299509603


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/SegmentCompletionUtils.java:
##########
@@ -44,4 +47,8 @@ public static String getSegmentNamePrefix(String segmentName) {
   public static String generateSegmentFileName(String segmentNameStr) {
     return getSegmentNamePrefix(segmentNameStr) + UUID.randomUUID().toString();
   }
+
+  public static boolean isTmpFile(String uri) {
+    return TMP_FILE.matcher(uri).find();

Review Comment:
   Actually I was thinking out the edge case that original full file path contains `.tmp.`. For example, the original file path `hdfs://path/to/XXX.tmp.YYY/segment_from_to`, the final tmp file path `hdfs://path/to/XXX.tmp.YYY/segment_from_to.tmp.UUID`. We can check splits.length > 1 and the last one is UUID



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #10815: [Draft] Periodical Tmp Segment file deletion

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #10815:
URL: https://github.com/apache/pinot/pull/10815#issuecomment-1622854384

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/10815?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#10815](https://app.codecov.io/gh/apache/pinot/pull/10815?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (007a03d) into [master](https://app.codecov.io/gh/apache/pinot/commit/51bf75efa65cbe8bd8b497eb20e34869205d74e8?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (51bf75e) will **decrease** coverage by `70.23%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@              Coverage Diff              @@
   ##             master   #10815       +/-   ##
   =============================================
   - Coverage     70.22%    0.00%   -70.23%     
   =============================================
     Files          2164     1661      -503     
     Lines        116369    87448    -28921     
     Branches      17599    13805     -3794     
   =============================================
   - Hits          81719        0    -81719     
   - Misses        28952    87448    +58496     
   + Partials       5698        0     -5698     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests1temurin11 | `0.00% <0.00%> (?)` | |
   | unittests2 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://app.codecov.io/gh/apache/pinot/pull/10815?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [.../data/manager/realtime/PinotFSSegmentUploader.java](https://app.codecov.io/gh/apache/pinot/pull/10815?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvUGlub3RGU1NlZ21lbnRVcGxvYWRlci5qYXZh) | `0.00% <0.00%> (-84.45%)` | :arrow_down: |
   | [.../data/manager/realtime/SegmentCompletionUtils.java](https://app.codecov.io/gh/apache/pinot/pull/10815?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvU2VnbWVudENvbXBsZXRpb25VdGlscy5qYXZh) | `0.00% <0.00%> (ø)` | |
   
   ... and [2086 files with indirect coverage changes](https://app.codecov.io/gh/apache/pinot/pull/10815/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1299518273


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1465,6 +1469,65 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe
     }
   }
 
+  public long deleteTmpSegments(String tableNameWithType) {
+    Preconditions.checkState(!_isStopping, "Segment manager is stopping");
+
+    if (!TableNameBuilder.isRealtimeTableResource(tableNameWithType)) {
+      return 0L;
+    }
+
+    TableConfig tableConfig = _helixResourceManager.getTableConfig(tableNameWithType);
+    if (tableConfig == null) {
+      LOGGER.warn("Failed to find table config for table: {}, skipping deletion of tmp segments", tableNameWithType);
+      return 0L;
+    }
+
+    if (!isLowLevelConsumer(tableNameWithType, tableConfig)
+        || !getIsSplitCommitEnabled()
+        || !isTmpSegmentAsyncDeletionEnabled()) {
+      return 0L;
+    }

Review Comment:
   right



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] ankitsultana commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "ankitsultana (via GitHub)" <gi...@apache.org>.
ankitsultana commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1303630415


##########
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java:
##########
@@ -215,9 +215,14 @@ public static class ControllerPeriodicTasksConf {
         "controller.realtime.segment.deepStoreUploadRetryEnabled";
     public static final String DEEP_STORE_RETRY_UPLOAD_TIMEOUT_MS =
         "controller.realtime.segment.deepStoreUploadRetry.timeoutMs";
+    public static final String ENABLE_TMP_SEGMENT_ASYNC_DELETION =
+        "controller.realtime.split.commit.segment.tmp.cleanup.async.enable";
+    public static final String SPLIT_COMMIT_TMP_SEGMENT_LIFETIME_SECOND =
+        "controller.realtime.split.commit.segment.tmp.lifetime.second";
 
     public static final int MIN_INITIAL_DELAY_IN_SECONDS = 120;
     public static final int MAX_INITIAL_DELAY_IN_SECONDS = 300;
+    public static final int DEFAULT_SPLIT_COMMIT_TMP_SEGMENT_LIFETIME_SECOND = 600;

Review Comment:
   I think we can keep this a bit higher (1 hour). In cases where controller is overwhelmed, deepstore latencies are high, etc. it is likely that the tmp segment may stick around for a good few minutes.



##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1465,6 +1469,64 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe
     }
   }
 
+  public long deleteTmpSegments(String tableNameWithType, List<SegmentZKMetadata> segmentsZKMetadata) {
+    Preconditions.checkState(!_isStopping, "Segment manager is stopping");
+
+    if (!TableNameBuilder.isRealtimeTableResource(tableNameWithType)) {
+      return 0L;
+    }
+
+    TableConfig tableConfig = _helixResourceManager.getTableConfig(tableNameWithType);
+    if (tableConfig == null) {
+      LOGGER.warn("Failed to find table config for table: {}, skipping deletion of tmp segments", tableNameWithType);
+      return 0L;
+    }
+
+    if (!isLowLevelConsumer(tableNameWithType, tableConfig)
+        || !getIsSplitCommitEnabled()
+        || !isTmpSegmentAsyncDeletionEnabled()) {
+      return 0L;
+    }
+
+    // Delete tmp segments for realtime table with low level consumer, split commit and async deletion is enabled.
+    Set<String> deepURIs = segmentsZKMetadata.stream().parallel().filter(meta -> meta.getStatus() == Status.DONE

Review Comment:
   nit: avoid parallel



##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/SegmentCompletionUtils.java:
##########
@@ -42,6 +43,19 @@ public static String getSegmentNamePrefix(String segmentName) {
   }
 
   public static String generateSegmentFileName(String segmentNameStr) {
-    return getSegmentNamePrefix(segmentNameStr) + UUID.randomUUID().toString();
+    return getSegmentNamePrefix(segmentNameStr) + UUID.randomUUID();
+  }
+
+  public static boolean isTmpFile(String uri) {

Review Comment:
   Can you add a UT or two for this?



##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1465,6 +1469,64 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe
     }
   }
 
+  public long deleteTmpSegments(String tableNameWithType, List<SegmentZKMetadata> segmentsZKMetadata) {
+    Preconditions.checkState(!_isStopping, "Segment manager is stopping");
+
+    if (!TableNameBuilder.isRealtimeTableResource(tableNameWithType)) {
+      return 0L;
+    }
+
+    TableConfig tableConfig = _helixResourceManager.getTableConfig(tableNameWithType);
+    if (tableConfig == null) {
+      LOGGER.warn("Failed to find table config for table: {}, skipping deletion of tmp segments", tableNameWithType);
+      return 0L;
+    }
+
+    if (!isLowLevelConsumer(tableNameWithType, tableConfig)
+        || !getIsSplitCommitEnabled()
+        || !isTmpSegmentAsyncDeletionEnabled()) {
+      return 0L;
+    }
+
+    // Delete tmp segments for realtime table with low level consumer, split commit and async deletion is enabled.
+    Set<String> deepURIs = segmentsZKMetadata.stream().parallel().filter(meta -> meta.getStatus() == Status.DONE
+        && !CommonConstants.Segment.METADATA_URI_FOR_PEER_DOWNLOAD.equals(meta.getDownloadUrl())).map(
+        SegmentZKMetadata::getDownloadUrl).collect(
+        Collectors.toSet());
+
+    String rawTableName = TableNameBuilder.extractRawTableName(tableNameWithType);
+    URI tableDirURI = URIUtils.getUri(_controllerConf.getDataDir(), rawTableName);
+    PinotFS pinotFS = PinotFSFactory.create(tableDirURI.getScheme());
+    long orphanTmpSegments = 0;
+    try {
+      for (String filePath : pinotFS.listFiles(tableDirURI, false)) {
+        // prepend scheme
+        URI uri = URIUtils.getUri(filePath);
+        if (isTmpAndCanDelete(uri, deepURIs, pinotFS)) {
+          LOGGER.info("Deleting temporary segment file: {}", uri);
+          Preconditions.checkState(pinotFS.delete(uri, true), "Failed to delete file: %s", uri);

Review Comment:
   If delete returns false, we can simply log it. Throwing would mean that we would skip deleting a bunch of files which should have been deleted in this run.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] ankitsultana commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "ankitsultana (via GitHub)" <gi...@apache.org>.
ankitsultana commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1307018192


##########
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java:
##########
@@ -215,9 +215,14 @@ public static class ControllerPeriodicTasksConf {
         "controller.realtime.segment.deepStoreUploadRetryEnabled";
     public static final String DEEP_STORE_RETRY_UPLOAD_TIMEOUT_MS =
         "controller.realtime.segment.deepStoreUploadRetry.timeoutMs";
+    public static final String ENABLE_TMP_SEGMENT_ASYNC_DELETION =
+        "controller.realtime.split.commit.segment.tmp.cleanup.async.enable";
+    public static final String SPLIT_COMMIT_TMP_SEGMENT_LIFETIME_SECOND =
+        "controller.realtime.split.commit.segment.tmp.lifetime.second";
 
     public static final int MIN_INITIAL_DELAY_IN_SECONDS = 120;
     public static final int MAX_INITIAL_DELAY_IN_SECONDS = 300;
+    public static final int DEFAULT_SPLIT_COMMIT_TMP_SEGMENT_LIFETIME_SECOND = 600;

Review Comment:
   It can be that the controller's thread-pool is full and the http-requests take a while to be processed because they get queued. The time that the tmp file lasts in that case could exceed 10 minutes or get close to it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1320447063


##########
pinot-controller/src/main/java/org/apache/pinot/controller/validation/RealtimeSegmentValidationManager.java:
##########
@@ -117,8 +117,18 @@ protected void processTable(String tableNameWithType, Context context) {
 
   private void runSegmentLevelValidation(TableConfig tableConfig, PartitionLevelStreamConfig streamConfig) {
     String realtimeTableName = tableConfig.getTableName();
+
     List<SegmentZKMetadata> segmentsZKMetadata = _pinotHelixResourceManager.getSegmentsZKMetadata(realtimeTableName);
 
+    // Delete tmp segments

Review Comment:
   It's Jackie's suggestion to avoid potential data race issue if we separate them in different jobs. Check the conversation history: https://github.com/apache/pinot/pull/10815#issuecomment-1642964875



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on a diff in pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1329564462


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1465,6 +1469,78 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe
     }
   }
 
+  /**
+   * Delete tmp segments for realtime table with low level consumer, split commit and async deletion is enabled.
+   * @param tableNameWithType
+   * @param segmentsZKMetadata
+   * @return number of deleted orphan temporary segments
+   *
+   */
+  public long deleteTmpSegments(String tableNameWithType, List<SegmentZKMetadata> segmentsZKMetadata) {
+    Preconditions.checkState(!_isStopping, "Segment manager is stopping");
+
+    if (!TableNameBuilder.isRealtimeTableResource(tableNameWithType)) {
+      return 0L;
+    }
+
+    TableConfig tableConfig = _helixResourceManager.getTableConfig(tableNameWithType);
+    if (tableConfig == null) {
+      LOGGER.warn("Failed to find table config for table: {}, skipping deletion of tmp segments", tableNameWithType);
+      return 0L;
+    }
+
+    if (!isLowLevelConsumer(tableNameWithType, tableConfig)
+        || !getIsSplitCommitEnabled()
+        || !isTmpSegmentAsyncDeletionEnabled()) {
+      return 0L;
+    }
+
+    Set<String> deepURIs = segmentsZKMetadata.stream().filter(meta -> meta.getStatus() == Status.DONE
+        && !CommonConstants.Segment.METADATA_URI_FOR_PEER_DOWNLOAD.equals(meta.getDownloadUrl())).map(
+        SegmentZKMetadata::getDownloadUrl).collect(
+        Collectors.toSet());
+
+    String rawTableName = TableNameBuilder.extractRawTableName(tableNameWithType);
+    URI tableDirURI = URIUtils.getUri(_controllerConf.getDataDir(), rawTableName);
+    PinotFS pinotFS = PinotFSFactory.create(tableDirURI.getScheme());
+    long deletedTmpSegments = 0;
+    try {
+      for (String filePath : pinotFS.listFiles(tableDirURI, false)) {
+        // prepend scheme
+        URI uri = URIUtils.getUri(filePath);
+        if (isTmpAndCanDelete(uri, deepURIs, pinotFS)) {
+          LOGGER.info("Deleting temporary segment file: {}", uri);
+          if (pinotFS.delete(uri, true)) {
+            LOGGER.info("Succeed to delete file: {}", uri);
+            deletedTmpSegments++;
+          } else {
+            LOGGER.warn("Failed to delete file: {}", uri);
+          }
+        }
+      }
+    } catch (Exception e) {
+      LOGGER.warn("Caught exception while deleting temporary files for table: {}", rawTableName, e);
+    }
+    return deletedTmpSegments;
+  }
+
+  private boolean isTmpAndCanDelete(URI uri, Set<String> deepURIs, PinotFS pinotFS) throws Exception {
+    long lastModified = pinotFS.lastModified(uri);
+    if (lastModified <= 0) {
+      LOGGER.warn("file {} modification time {} is not positive, ineligible for delete", uri.toString(), lastModified);
+      return false;
+    }
+    String uriString = uri.toString();
+    return SegmentCompletionUtils.isTmpFile(uriString) && !deepURIs.contains(uriString)
+        && getCurrentTimeMs() - lastModified > _controllerConf.getTmpSegmentRetentionInSeconds() * 1000L;
+  }
+
+  private boolean isLowLevelConsumer(String tableNameWithType, TableConfig tableConfig) {

Review Comment:
   this `isLowLevelConsumer` is deprecated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang merged pull request #10815: Periodically delete Tmp Segment file brought by the SplitCommit End phase

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang merged PR #10815:
URL: https://github.com/apache/pinot/pull/10815


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] wirybeaver commented on a diff in pull request #10815: [Draft] Periodical Tmp Segment file deletion

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #10815:
URL: https://github.com/apache/pinot/pull/10815#discussion_r1218415965


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/SegmentCompletionUtils.java:
##########
@@ -30,6 +31,8 @@ private SegmentCompletionUtils() {
   private static final Logger LOGGER = LoggerFactory.getLogger(SegmentCompletionUtils.class);
   // Used to create temporary segment file names
   private static final String TMP = ".tmp.";
+  private static final Pattern TMP_FILE = Pattern.compile(
+      "\\.tmp\\.[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$");

Review Comment:
   maybe switched to `"[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$"` so that old temp segments created by server can also be cleared up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org