You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Ethan Rose (Jira)" <ji...@apache.org> on 2022/09/01 22:22:00 UTC

[jira] [Commented] (HDDS-7198) Datanodes should avoid using decommissioning nodes as a container replication source

    [ https://issues.apache.org/jira/browse/HDDS-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599171#comment-17599171 ] 

Ethan Rose commented on HDDS-7198:
----------------------------------

One solution is to have SCM provide the order of replication sources to the datanodes. SCM could shuffle the list of in service replicas and place the decommissioning replica last in the list. Datanodes would iterate the list provided to them by SCM to determine the order to try to replicate from. With this approach datanodes do exactly as the SCM tells them since it is the master service for HDDS. This is also extensible to more advanced replication control in the future, where SCM can order source replicas based on in flight replications to load balance throughout the cluster.

> Datanodes should avoid using decommissioning nodes as a container replication source
> ------------------------------------------------------------------------------------
>
>                 Key: HDDS-7198
>                 URL: https://issues.apache.org/jira/browse/HDDS-7198
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: Ozone Datanode, SCM
>            Reporter: Ethan Rose
>            Priority: Major
>
> Currently when SCM tells a target datanode to replicate a container, it sends the target datanode an ordered list of source datanodes it should download the container from. The target then shuffles the list and tries to download from the sources in the resulting order one by one until one of them succeeds.
> In failure scenarios this works fine. The node that had the failure will not be included in the source list, distributing the source replication load throughout the cluster. However, when a datanode is decommissioning, it will be included in the source list with no distinction from other replicas, causing it to bear a disproportionate amount of the replication load.
> For example, if every container in the cluster has three replicas and one datanode is being decommissioned, the decommissioning node will be the source for 33% of the replications, while the other 66% will be distributed throughout the cluster based on placement of the other container replicas. With datanodes currently throttled at 10 concurrent replication requests, this will place continuous load on the decommissioning node (which may already be in a bad state hence why it is being removed), while decreasing parallelization of the overall replications required.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org