You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kudu.apache.org by "Andrew Wong (Jira)" <ji...@apache.org> on 2019/09/23 23:20:00 UTC

[jira] [Created] (KUDU-2950) Support restarting nodes in batches

Andrew Wong created KUDU-2950:
---------------------------------

Summary: Support restarting nodes in batches
Key: KUDU-2950
URL: https://issues.apache.org/jira/browse/KUDU-2950
Project: Kudu
Issue Type: Improvement
Reporter: Andrew Wong

Once Kudu has the building blocks to orchestrate a rolling restart, it'd be great if we could support restarting multiple nodes at a time.

Location awareness would play a crucial role in this because, if used to identify racks placement, we could bring down an entire rack at a time if we wanted. If we did this, though, during the controlled restart of a given rack, Kudu would be more vulnerable to the _unexpected_ downtime of another rack.

One approach would be to support something like HDFS's upgrade domains:
{quote}The idea is to group datanodes in a new dimension called upgrade domain, in addition to the existing rack-based grouping. For example, we can assign all datanodes in the first position of any rack to upgrade domain ud_01, nodes in the second position to upgrade domain ud_02 and so on.
...
By default, 3 replicas of any given block are placed on 3 different upgrade domains. This means all datanodes belonging to a specific upgrade domain collectively won’t store more than one replica of any block.
{quote}
The decoupling of physical devices from restartable groups should make a batch restarts more robust to rack failures.

--
This message was sent by Atlassian Jira
(v8.3.4#803005)