You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/02/01 02:16:48 UTC

[GitHub] [pinot] jackjlli edited a comment on pull request #8098: Validate the numbers of input and output files in HadoopSegmentCreationJob

jackjlli edited a comment on pull request #8098:
URL: https://github.com/apache/pinot/pull/8098#issuecomment-1026332309


   > What if user wants to replace an existing segment with a new generated one? We should allow overriding existing segment, but not the newly pushed ones.
   
   If user wants to replace an existing segment, he/she can still use the current logic to do that. I've update the logic of the PR to validate the number of input and output files. Since there is 1:1 mapping between the input and output files, if these two number doesn't match, we should fail the job. 
   The previous logic that sets `overwrite` flag doesn't work as the destination is just a temp dir for each of the mapper. The actual merge step from mapper temp dir to final output dir is done inside the `commitTask` method of `FileOutputCommitter` class, which is out of the scope of our MR job.
   
   Sample log:
   ```
   2022-01-31 21:49:15,627 INFO [main] org.apache.pinot.hadoop.job.mappers.HadoopSegmentCreationMapper: Copying segment tar file from: pinot_hadoop_tmp/segmentTar/table1_2022-01-25_2022-01-25.tar.gz to: hdfs://path1/pinot_segments/3f5420af-6422-4035-9d53-2dd1895c2747/output/_temporary/1/_temporary/attempt_1632281309592_18465840_m_000001_0/segmentTar/table1_2022-01-25_2022-01-25.tar.gz
   ...
   2022-01-31 21:49:17,484 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of task 'attempt_1632281309592_18465840_m_000001_0' to hdfs://path1/pinot_segments/3f5420af-6422-4035-9d53-2dd1895c2747/output
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org