You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/07/30 19:09:51 UTC

[GitHub] [pinot] kkrugler commented on a change in pull request #7222: 7090 segmentnamegenerator accept input file parameter

kkrugler commented on a change in pull request #7222:
URL: https://github.com/apache/pinot/pull/7222#discussion_r680164442



##########
File path: pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-common/src/main/java/org/apache/pinot/plugin/ingestion/batch/common/SegmentGenerationTaskRunner.java
##########
@@ -149,6 +155,9 @@ private SegmentNameGenerator getSegmentNameGenerator() {
             Boolean.parseBoolean(segmentNameGeneratorConfigs.get(EXCLUDE_SEQUENCE_ID)),
             IngestionConfigUtils.getBatchSegmentIngestionType(tableConfig),
             IngestionConfigUtils.getBatchSegmentIngestionFrequency(tableConfig), dateTimeFormatSpec);
+      case INPUT_FILE_SEGMENT_NAME_GENERATOR:
+        return new InputFileSegmentNameGenerator(segmentNameGeneratorConfigs.get(FILE_PATH_PATTERN),

Review comment:
       1. You are correct, we actually could set up the segment name here, because there's a new SegmentNameGenerator created for each segment. But I was following the current API convention, where the `SegmentNameGenerator.generateSegmentName()` method is called with per-segment parameters. Otherwise there's no reason for it to be an interface. If you want to change how segment naming is handled as part of this PR, I could get rid of the interface, and have a single SegmentNameGenerator class with a static method that takes the naming type, the config parameter map, and returns a segment name.
   2. Though I don't follow why this change would better handle movement of input files. The segment is generated once, not dynamically, and during the actual job generating segments the input files better not be moving around, as their paths are assumed to be stable for the duration of the job (e.g. with Hadoop, these paths are stored as the input data for the segment generation job).
   3. Regardless, there's still the issue of how to handle segment naming based on the original input file path, which was the use case behind issue #7090. I could make it work, but it's a bit tricky to extract paths from a URI.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org