You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/07/11 22:02:35 UTC

[GitHub] [incubator-druid] jihoonson commented on issue #8061: Native parallel batch indexing with shuffle

jihoonson commented on issue #8061: Native parallel batch indexing with shuffle
URL: https://github.com/apache/incubator-druid/issues/8061#issuecomment-510670825
 
 
   @himanshug thanks for taking a look!
   
   > does overlord treat "supervisor" task as special task to be able to initiate cleanup requests? what if the MM is down temporarily or is cleanup fails for some reason ? In addition to overlord cleanup requests, It might be good for middleManagers to periodically check whether "supervisor" task is running or not and do the self cleanup.
   
   Ah this is a good point. To handle middleManager failure, a sort of self-cleanup can be triggered when some amount of time is elapsed since the last access to any partition for a supervisorTask. Does this sound good?
   
   > also maybe have some MM level configuration around maximum disk space that can be utilized for intermediary data.
   
   Thanks for reminding me of this. Forgot to add it to the proposal. I'm thinking to use the existing `StorageLocationConfig` for this. To fully utilize the disk bandwidth, the partitions of the same supervisorTaskId will be assigned in a round-robin fashion. Will update the proposal shortly.
   
   > I think a user defined upper limit could always exist in all "supervisor" tasks that spawn extra tasks so that user can plan worker capacity knowing how many tasks at a maximum would be running via parallel [shuffle] task.
   
   This is already supported with `maxNumSubTasks` (https://druid.apache.org/docs/latest/ingestion/native_tasks.html#tuningconfig). `maxNumSubTasks` is to limit the total number of subtasks at any time while a parallel index task is running. `numSecondPhaseTasks` is somewhat different. It's the total number of phase 2 tasks and the supervisor task will regard the phase 2 is succeeded once `numSecondPhaseTasks` phase 2 tasks are succeeded.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org