You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Siddhant Sangwan (Jira)" <ji...@apache.org> on 2023/09/20 05:30:00 UTC

[jira] [Updated] (HDDS-9321) LegacyReplicationManager: Unhealthy replicas of a sufficiently replicated container can block decommissioning

     [ https://issues.apache.org/jira/browse/HDDS-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siddhant Sangwan updated HDDS-9321:
-----------------------------------
    Description: 
Mix of quasi-closed and unhealthy replicas blocks decommission even if sufficiently replicated.
a. Caused when only some of the replicas hit the error during write.
b. Can be fixed by removing this check:
{code}
if (!replicaSet.isHealthy()) {
          if (LOG.isDebugEnabled()) {
            unhealthyIDs.add(cid);
          }
          if (unhealthy < CONTAINER_DETAILS_LOGGING_LIMIT
{code}

However, simply removing that check is not a complete solution. We need to try and preserve any UNHEALTHY replicas that have the greatest Sequence ID.

  was:
Mix of quasi-closed and unhealthy replicas blocks decommission even if sufficiently replicated.
a. Caused when only some of the replicas hit the error during write.
b. Can be fixed by removing this check:
{code}
if (!replicaSet.isHealthy()) {
          if (LOG.isDebugEnabled()) {
            unhealthyIDs.add(cid);
          }
          if (unhealthy < CONTAINER_DETAILS_LOGGING_LIMIT
{code}


> LegacyReplicationManager: Unhealthy replicas of a sufficiently replicated container can block decommissioning
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-9321
>                 URL: https://issues.apache.org/jira/browse/HDDS-9321
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: SCM
>            Reporter: Siddhant Sangwan
>            Assignee: Siddhant Sangwan
>            Priority: Major
>
> Mix of quasi-closed and unhealthy replicas blocks decommission even if sufficiently replicated.
> a. Caused when only some of the replicas hit the error during write.
> b. Can be fixed by removing this check:
> {code}
> if (!replicaSet.isHealthy()) {
>           if (LOG.isDebugEnabled()) {
>             unhealthyIDs.add(cid);
>           }
>           if (unhealthy < CONTAINER_DETAILS_LOGGING_LIMIT
> {code}
> However, simply removing that check is not a complete solution. We need to try and preserve any UNHEALTHY replicas that have the greatest Sequence ID.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org