You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2018/08/05 06:07:11 UTC

[GitHub] josephglanville commented on issue #5492: Native parallel batch indexing without shuffle

josephglanville commented on issue #5492: Native parallel batch indexing without shuffle
URL: https://github.com/apache/incubator-druid/pull/5492#issuecomment-410498288
 
 
   @jihoonson if I understand the semantics correctly if you want to create segments with perfect rollup you can return input splits that map all of the data for each output segment to only one subtask per segment?
   ie. have getSplits return `Steam<InputSplit<List<SplitType>>>` and withSplit take `InputSplit<List<SplitType>`. Where getSplits returns splits that are partitioned by output segment intervals.
   
   My goal with this line of thinking is to alleviate the need for the merging/shuffle phase. As long as the number of files you need to read per segment isn't too large for a single subtask this seems like a reasonable approach?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org