You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/06/09 02:00:56 UTC

[GitHub] [incubator-druid] licl2014 opened a new issue #7855: Overlord will stop schedule task when a MM has problem.

licl2014 opened a new issue #7855: Overlord will stop schedule task when a MM has problem.
URL: https://github.com/apache/incubator-druid/issues/7855
 
 
   Overlord will stop schedule task when a MM has problem(eg. `/tmp` can't be used).
   
   ### Affected Version
   
   0.9.2/0.12.3
   
   ### Description
   
   Our cluster has 3 MMs and selectStrategy in overlord is `equalDistribution`
   - Because of `equalDistribution`, overlord select MM1 to run task and select MM2 to run other task .
   - Then overlord assign task to MM3. but the `/tmp` directory in MM3 can't be used, so task will failed on it. 
   - Overlord will wait `5 Min`(default value) to avoid overflowing MM3 with tasks and do not assign other pending task in `5 Min` .
   - Next round, overlord still assign task on MM3, because MM3 has no running task. then go into a dead cycle unless tasks which running on MM1 or MM2 complete.
   
   ### Solution
   We can have  MM `backlist` ,if too many tasks failed on a MM, overlord will disable the MM automatically.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org