You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/07/29 03:42:00 UTC

[GitHub] [pinot] kkrugler edited a comment on pull request #7222: 7090 segmentnamegenerator accept input file parameter

kkrugler edited a comment on pull request #7222:
URL: https://github.com/apache/pinot/pull/7222#issuecomment-888686908


   Hi @Jackie-Jiang - a few things about this WIP...
   
   1. The inputFilePath that I'm getting passed inside the `SegmentGeneratorConfig`, which is passed to `SegmentIndexCreationDriverImpl` (at least when running locally) is a path to a temp directory where the input file has been copied. So the real input path has been lost is that case, which means if the input file path pattern is matching anything other than the file name, it won't work. I see that `SegmentGenerationJobRunner.submitSegmentGenTask()` (in pinot-batch-ingestion-standalone) is where the input file gets copied to the local temp dir, I assume this is so that the input data can be read from a regular Java File vs. needing to use abstract FileSystem stuff everywhere. Wondering if it's worthwhile to at least try to replicate (say for 2-3 levels) the input file hierarchy inside of the temp input dir.
   2. I didn't want to hit the `pom.xml` files, but my build was failing without those changes to skip checks on Eclipse-generated files by rat and the apache license checker. I also had to do some dependency management to avoid a build failure due to pinot integration tests pulling in some local cloud test code which used a different version of AWS SDK jars. But I could try to separate those into a different PR.
   3. I noticed a few other bits I should clean up when reviewing the committed files (e.g. Javadoc for `InputFileSegmentNameGenerator`, cleaning up use of `@Nullable` in arguments). I've made those changes in my branch, just haven't pushed yet to update the PR.
   
   Anyway, looking for input on items 1 & 2 above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org