You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/12/22 01:14:32 UTC

[GitHub] [incubator-pinot] fx19880617 commented on issue #6349: Standalone ingestion jobs do not clean up output files after completing loading

fx19880617 commented on issue #6349:
URL: https://github.com/apache/incubator-pinot/issues/6349#issuecomment-749282776


   1. For the ingestion job, it's by design to keep the segments in output directory. The reason is that for URI and METADATA push job, the output dir is treated at the source of truth of the segment. E.g. users will use this job to generate segments and directly write into s3, then push metadata to Pinot for loading segments from the same s3 directory. 
   
   I think it's ok to add a config like `cleanUpOutputDir` to delete the output directory if the push mode is `TAR` and the default value should be false. 
   
   2. We usually expect the ingestion job output directory to be empty, but you are right, if there are segments already there or building in progress, then it will push them all. 
   
   To solve this I feel we can:
   - Merge segment generation and push into one task;
   - Let segment generation job return an array of generated tar file URIs
   - Push task will take the array and do the work.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org