You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Attila Doroszlai (Jira)" <ji...@apache.org> on 2023/05/19 11:36:00 UTC

[jira] [Updated] (HDDS-8617) Ratis underreplication due to maintenance is not deprioritised

     [ https://issues.apache.org/jira/browse/HDDS-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Attila Doroszlai updated HDDS-8617:
-----------------------------------
    Status: Patch Available  (was: In Progress)

> Ratis underreplication due to maintenance is not deprioritised
> --------------------------------------------------------------
>
>                 Key: HDDS-8617
>                 URL: https://issues.apache.org/jira/browse/HDDS-8617
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: SCM
>    Affects Versions: 1.4.0
>            Reporter: Attila Doroszlai
>            Assignee: Attila Doroszlai
>            Priority: Major
>
> According to the following javadoc, both decommission and maintenance replicas should be deprioritised:
> {code:title=https://github.com/apache/ozone/blob/6d9002201e58dc995dc133941acaef2af03cb9d2/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ContainerHealthResult.java#L145-L164}
>     /**
>      * The weightedRedundancy, is the remaining redundancy + the requeue count.
>      * When this value is used for ordering in a priority queue it ensures the
>      * priority is reduced each time it is requeued, to prevent it from blocking
>      * other containers from being processed.
>      * Additionally, so that decommission and maintenance replicas are not
>      * ordered ahead of under-replicated replicas, a redundancy of
>      * DECOMMISSION_REDUNDANCY is used for the decommission redundancy rather
>      * than its real redundancy.
>      * @return The weightedRedundancy of this result.
>      */
>     public int getWeightedRedundancy() {
>       int result = requeueCount;
>       if (dueToDecommission) {
>         result += DECOMMISSION_REDUNDANCY;
>       } else {
>         result += getRemainingRedundancy();
>       }
>       return result;
>     }
> {code}
> but {{dueToDecommission=true}} is set only based on decommission replicas, ignoring maintenance replicas ({{maintenanceCount}}):
> {code:title=https://github.com/apache/ozone/blob/6d9002201e58dc995dc133941acaef2af03cb9d2/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/RatisContainerReplicaCount.java#L520-L533}
>   /**
>    * Checks whether insufficient replication is because of some replicas
>    * being on datanodes that were decommissioned.
>    * @param includePendingAdd if pending adds should be considered
>    * @return true if there is insufficient replication and it's because of
>    * decommissioning.
>    */
>   public boolean inSufficientDueToDecommission(boolean includePendingAdd) {
>     if (isSufficientlyReplicated(includePendingAdd)) {
>       return false;
>     }
>     int delta = redundancyDelta(true, includePendingAdd);
>     return decommissionCount >= delta;
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org