You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/06/09 02:00:56 UTC
[GitHub] [incubator-druid] licl2014 opened a new issue #7855: Overlord will
stop schedule task when a MM has problem.
licl2014 opened a new issue #7855: Overlord will stop schedule task when a MM has problem.
URL: https://github.com/apache/incubator-druid/issues/7855
Overlord will stop schedule task when a MM has problem(eg. `/tmp` can't be used).
### Affected Version
0.9.2/0.12.3
### Description
Our cluster has 3 MMs and selectStrategy in overlord is `equalDistribution`
- Because of `equalDistribution`, overlord select MM1 to run task and select MM2 to run other task .
- Then overlord assign task to MM3. but the `/tmp` directory in MM3 can't be used, so task will failed on it.
- Overlord will wait `5 Min`(default value) to avoid overflowing MM3 with tasks and do not assign other pending task in `5 Min` .
- Next round, overlord still assign task on MM3, because MM3 has no running task. then go into a dead cycle unless tasks which running on MM1 or MM2 complete.
### Solution
We can have MM `backlist` ,if too many tasks failed on a MM, overlord will disable the MM automatically.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org