You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2018/08/14 12:50:03 UTC

[GitHub] pdeva opened a new issue #6172: nodes should allow draining

pdeva opened a new issue #6172: nodes should allow draining
URL: https://github.com/apache/incubator-druid/issues/6172

currently individual nodes have no concept of draining.
this means when updating the cluster, as you take down nodes, the queries in progress will fail.
similarly, if you have broker nodes behind a load balancer, there is no way to tell the load balancer to stop sending new connection to the node you are about to update, which can result in many seconds of requests being sent to nodes down depending on health check interval.

i suggest adding a couple endpoints to broker nodes:

1. `/health/ping` returns 200 when the broker is ready to serve queries
2. `/health/startDrain` sets a flag that makes `/health/ping` throw 500. this will make load balancer health checks fail while not dropping existing connections, resulting in zero downtime updates.

Similar endpoints can be put on MM and Historical nodes, with coordinator performing the health check. when the health check returns non-200 values, coordinator can instruct broker not to send any new queries.

This will result in 100% zero downtime rolling updates.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org