You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Stephen O'Donnell (Jira)" <ji...@apache.org> on 2023/09/07 11:27:00 UTC

[jira] [Assigned] (HDDS-8536) ReplicationManager: Unhealthy replicas could block Ratis containers being recovered

     [ https://issues.apache.org/jira/browse/HDDS-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephen O'Donnell reassigned HDDS-8536:
---------------------------------------

    Assignee: Stephen O'Donnell

> ReplicationManager: Unhealthy replicas could block Ratis containers being recovered
> -----------------------------------------------------------------------------------
>
>                 Key: HDDS-8536
>                 URL: https://issues.apache.org/jira/browse/HDDS-8536
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: SCM
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>
> In a similar way to HDDS-8535, if the cluster is small, say 4 nodes and a Ratis container has 2 unhealthy containers, RM will currently recover one new replia, leaving all 4 nodes used with 2 healthy and 2 unhealthy. As unhealthy containers are only removed after all over and under replication has been resolved, the container will remain stuck like this.
> To avoid this, if there are insufficient spare nodes and also some unhealthy containers, then the under replication handler may need to call into the unhealthy handler to remove some of the unhealthy replicas to allow progress to be made.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org