You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/07/04 23:14:22 UTC

[GitHub] [incubator-druid] gianm opened a new issue #5709: Broker resiliency to misbehaving historical nodes

gianm opened a new issue #5709: Broker resiliency to misbehaving historical nodes
URL: https://github.com/apache/incubator-druid/issues/5709
 
 
   Sometimes we see  'zombie' nodes that are nominally responsive but are having underlying problems. This can be due to bad disks, bad configuration, or any number of other causes. Due to the vicissitudes of life, we cannot necessarily predict all of these in advance. So two things would be useful as general mitigations,
   
   1. An ability for the broker to retry queries to data nodes that fail, on the grounds that perhaps another node will succeed.
   2. An ability for the broker to blacklist data nodes that fail too often relative to other nodes.
   
   You want (1) to not be too aggressive -- it could lead to doing too much work on a query that is doomed to failure anyway (maybe something's wrong with the query). You also want (2) to not be too aggressive -- it's senseless to blacklist half the cluster, for example.
   
   You also want the list from (2) to be exposed via API somehow, since folks might want to build automation that takes those nodes out of service, raises alerts about them, replaces them automatically, etc.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org