You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/06/13 16:52:18 UTC

[GitHub] [ozone] sodonnel opened a new pull request, #3512: HDDS-6697. EC: ReplicationManager - create class to detect EC container health issues

sodonnel opened a new pull request, #3512:
URL: https://github.com/apache/ozone/pull/3512

   ## What changes were proposed in this pull request?
   
   This PR defines a ContainerHealthCheckInterface, with the following definition:
   
   ```
     ContainerHealthResult checkHealth(
         ContainerInfo container, Set<ContainerReplica> replicas,
         List<Pair<Integer, DatanodeDetails>> indexesPendingAdd,
         List<Pair<Integer, DatanodeDetails>> indexesPendingDelete,
         int remainingRedundancyForMaintenance);
   }
   ```
   
   And also an implementation for EC containers that can determine if an EC container is over or under replicated. The interface will return a ContainerHealthResult object that contains information about the health state. Eg for under replicated, it will contain details about the remaining redundancy, whether its due to decommission etc.
   
   For now this class is standalone and unused. It will be integrated with ReplicationManager soon via a new Jira.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-6697
   
   ## How was this patch tested?
   
   New unit tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3512: HDDS-6697. EC: ReplicationManager - create class to detect EC container health issues

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3512:
URL: https://github.com/apache/ozone/pull/3512#discussion_r896844312


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ECContainerHealthCheck.java:
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p/>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p/>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.hdds.scm.container.replication;
+
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.hdds.client.ECReplicationConfig;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.container.ContainerInfo;
+import org.apache.hadoop.hdds.scm.container.ContainerReplica;
+import org.apache.hadoop.hdds.scm.container.ECContainerReplicaCount;
+
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Class to determine the health state of an EC Container. Given the container

Review Comment:
   it says "add and delete" which is correct I think.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on a diff in pull request #3512: HDDS-6697. EC: ReplicationManager - create class to detect EC container health issues

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on code in PR #3512:
URL: https://github.com/apache/ozone/pull/3512#discussion_r896775641


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ECContainerHealthCheck.java:
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p/>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p/>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.hdds.scm.container.replication;
+
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.hdds.client.ECReplicationConfig;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.container.ContainerInfo;
+import org.apache.hadoop.hdds.scm.container.ContainerReplica;
+import org.apache.hadoop.hdds.scm.container.ECContainerReplicaCount;
+
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Class to determine the health state of an EC Container. Given the container

Review Comment:
   Nit: typo below: "and and delete"--> "and delete"
   



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ContainerHealthResult.java:
##########
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p/>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p/>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.hdds.scm.container.replication;
+
+import org.apache.hadoop.hdds.scm.container.ContainerInfo;
+import org.apache.hadoop.ozone.protocol.commands.SCMCommand;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Class used to represent the Health States of containers.
+ */
+public class ContainerHealthResult {
+
+  /**
+   * All possible container health states.
+   */
+  public enum HealthState {
+    HEALTHY,
+    UNDER_REPLICATED,
+    OVER_REPLICATED
+  }
+
+  private final ContainerInfo containerInfo;
+  private final HealthState healthState;
+  private final List<SCMCommand> commands = new ArrayList<>();
+
+  ContainerHealthResult(ContainerInfo containerInfo, HealthState healthState) {
+    this.containerInfo = containerInfo;
+    this.healthState = healthState;
+  }
+
+  public HealthState getHealthState() {
+    return healthState;
+  }
+
+  public void addCommand(SCMCommand command) {
+    commands.add(command);
+  }
+
+  public List<SCMCommand> getCommands() {
+    return commands;
+  }
+
+  public ContainerInfo getContainerInfo() {
+    return containerInfo;
+  }
+
+  /**
+   * Class for Healthy Container Check results.
+   */
+  public static class HealthyResult extends ContainerHealthResult {
+
+    HealthyResult(ContainerInfo containerInfo) {
+      super(containerInfo, HealthState.HEALTHY);
+    }
+  }
+
+  /**
+   * Class for UnderReplicated Container Check Results.
+   */
+  public static class UnderReplicatedHealthResult
+      extends ContainerHealthResult {
+
+    private final int remainingRedundancy;
+    private final boolean dueToDecommission;
+    private final boolean sufficientlyReplicatedAfterPending;
+    private final boolean unrecoverable;
+
+    UnderReplicatedHealthResult(ContainerInfo containerInfo,
+        int remainingRedundancy, boolean dueToDecommission,
+        boolean replicatedOkWithPending, boolean unrecoverable) {
+      super(containerInfo, HealthState.UNDER_REPLICATED);
+      this.remainingRedundancy = remainingRedundancy;
+      this.dueToDecommission = dueToDecommission;
+      this.sufficientlyReplicatedAfterPending = replicatedOkWithPending;
+      this.unrecoverable = unrecoverable;
+    }
+
+    /**
+     * How many more replicas can be lost before the the container is
+     * unreadable. For containers which are under-replicated due to decommission
+     * or maintenance only, the remaining redundancy will include those
+     * decommissioning or maintenance replicas, as they are technically still
+     * available until the datanode processes are stopped.
+     * @return Count of remaining redundant replicas.
+     */
+    public int getRemainingRedundancy() {
+      return remainingRedundancy;
+    }
+
+    /**
+     * Indicates whether the under-replication is caused only by replicas
+     * being decommissioned or entering maintenance. Ie, there are not replicas
+     * unavailable.
+     * @return True is the under-replication is caused by decommission.
+     */
+    public boolean underReplicatedDueToDecommission() {
+      return dueToDecommission;
+    }
+
+    /**
+     * Considering the pending replicas, which have been scheduled for copy or
+     * reconstruction, will the container still be under-replicated when they
+     * complete.
+     * @return True if the under-replication is corrected by the pending
+     *         replicas. False otherwise.
+     */
+    public boolean isSufficientlyReplicatedAfterPending() {
+      return sufficientlyReplicatedAfterPending;
+    }
+
+    /**
+     * Indicates whether a container has enough replicas to be read. For Ratis
+     * at least one replia must be available. For EC, at least dataNum replicas
+     * are needed.
+     * @return True if the container has insufficient replicas available to be
+     *         read, false otherwise
+     */
+    public boolean isUnrecoverable() {
+      return unrecoverable;
+    }
+  }
+
+  /**
+   * Class for Over Replicated Container Health Results.
+   */
+  public static class OverReplicatedHealthResult extends ContainerHealthResult {
+
+    private final int excessRedundancy;
+    private final boolean sufficientReplicatedAfterPending;
+
+
+    OverReplicatedHealthResult(ContainerInfo containerInfo,
+        int excessRedundancy, boolean replicatedOkWithPending) {
+      super(containerInfo, HealthState.OVER_REPLICATED);
+      this.excessRedundancy = excessRedundancy;
+      this.sufficientReplicatedAfterPending = replicatedOkWithPending;
+    }
+
+    /**
+     * How many extra replicas are present for this container and need to be
+     * removed. For EC, this number indicates how many indexes have more than
+     * one copy. Eg index 1 could have 3 copies. Index 2 could have 2 copies.
+     * The rest have 1 copy. The value returned here will be 2, indicating 2
+     * indexes have excess redundancy, even though we have to remove 3 replicas.
+     * For Ratis containers, the number return is simply the current replica
+     * count minus the expected replica count.
+     * @return The number of excess replicas.
+     */
+    public int getExcessRedundancy() {
+      return excessRedundancy;
+    }
+
+    /**
+     * Considering the pending replicas, which have been scheduled for delete,
+     * will the container still be over-replicated when they complete.
+     * @return True if the over-replication is corrected by the pending
+     *         deletes. False otherwise.
+     */

Review Comment:
   FOr consistency, you may want to rename this to isSufficientlyReplicatedAfterPending?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on a diff in pull request #3512: HDDS-6697. EC: ReplicationManager - create class to detect EC container health issues

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on code in PR #3512:
URL: https://github.com/apache/ozone/pull/3512#discussion_r896932698


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ECContainerHealthCheck.java:
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p/>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p/>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.hdds.scm.container.replication;
+
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.hdds.client.ECReplicationConfig;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.container.ContainerInfo;
+import org.apache.hadoop.hdds.scm.container.ContainerReplica;
+import org.apache.hadoop.hdds.scm.container.ECContainerReplicaCount;
+
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Class to determine the health state of an EC Container. Given the container

Review Comment:
   he he...closing eyes. ignore my comment. sorry.



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ContainerHealthResult.java:
##########
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p/>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p/>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.hdds.scm.container.replication;
+
+import org.apache.hadoop.hdds.scm.container.ContainerInfo;
+import org.apache.hadoop.ozone.protocol.commands.SCMCommand;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Class used to represent the Health States of containers.
+ */
+public class ContainerHealthResult {
+
+  /**
+   * All possible container health states.
+   */
+  public enum HealthState {
+    HEALTHY,
+    UNDER_REPLICATED,
+    OVER_REPLICATED
+  }
+
+  private final ContainerInfo containerInfo;
+  private final HealthState healthState;
+  private final List<SCMCommand> commands = new ArrayList<>();
+
+  ContainerHealthResult(ContainerInfo containerInfo, HealthState healthState) {
+    this.containerInfo = containerInfo;
+    this.healthState = healthState;
+  }
+
+  public HealthState getHealthState() {
+    return healthState;
+  }
+
+  public void addCommand(SCMCommand command) {
+    commands.add(command);
+  }
+
+  public List<SCMCommand> getCommands() {
+    return commands;
+  }
+
+  public ContainerInfo getContainerInfo() {
+    return containerInfo;
+  }
+
+  /**
+   * Class for Healthy Container Check results.
+   */
+  public static class HealthyResult extends ContainerHealthResult {
+
+    HealthyResult(ContainerInfo containerInfo) {
+      super(containerInfo, HealthState.HEALTHY);
+    }
+  }
+
+  /**
+   * Class for UnderReplicated Container Check Results.
+   */
+  public static class UnderReplicatedHealthResult
+      extends ContainerHealthResult {
+
+    private final int remainingRedundancy;
+    private final boolean dueToDecommission;
+    private final boolean sufficientlyReplicatedAfterPending;
+    private final boolean unrecoverable;
+
+    UnderReplicatedHealthResult(ContainerInfo containerInfo,
+        int remainingRedundancy, boolean dueToDecommission,
+        boolean replicatedOkWithPending, boolean unrecoverable) {
+      super(containerInfo, HealthState.UNDER_REPLICATED);
+      this.remainingRedundancy = remainingRedundancy;
+      this.dueToDecommission = dueToDecommission;
+      this.sufficientlyReplicatedAfterPending = replicatedOkWithPending;
+      this.unrecoverable = unrecoverable;
+    }
+
+    /**
+     * How many more replicas can be lost before the the container is
+     * unreadable. For containers which are under-replicated due to decommission
+     * or maintenance only, the remaining redundancy will include those
+     * decommissioning or maintenance replicas, as they are technically still
+     * available until the datanode processes are stopped.
+     * @return Count of remaining redundant replicas.
+     */
+    public int getRemainingRedundancy() {
+      return remainingRedundancy;
+    }
+
+    /**
+     * Indicates whether the under-replication is caused only by replicas
+     * being decommissioned or entering maintenance. Ie, there are not replicas
+     * unavailable.
+     * @return True is the under-replication is caused by decommission.
+     */
+    public boolean underReplicatedDueToDecommission() {
+      return dueToDecommission;
+    }
+
+    /**
+     * Considering the pending replicas, which have been scheduled for copy or
+     * reconstruction, will the container still be under-replicated when they
+     * complete.
+     * @return True if the under-replication is corrected by the pending
+     *         replicas. False otherwise.
+     */
+    public boolean isSufficientlyReplicatedAfterPending() {
+      return sufficientlyReplicatedAfterPending;
+    }
+
+    /**
+     * Indicates whether a container has enough replicas to be read. For Ratis
+     * at least one replia must be available. For EC, at least dataNum replicas
+     * are needed.
+     * @return True if the container has insufficient replicas available to be
+     *         read, false otherwise
+     */
+    public boolean isUnrecoverable() {
+      return unrecoverable;
+    }
+  }
+
+  /**
+   * Class for Over Replicated Container Health Results.
+   */
+  public static class OverReplicatedHealthResult extends ContainerHealthResult {
+
+    private final int excessRedundancy;
+    private final boolean sufficientReplicatedAfterPending;
+
+
+    OverReplicatedHealthResult(ContainerInfo containerInfo,
+        int excessRedundancy, boolean replicatedOkWithPending) {
+      super(containerInfo, HealthState.OVER_REPLICATED);
+      this.excessRedundancy = excessRedundancy;
+      this.sufficientReplicatedAfterPending = replicatedOkWithPending;
+    }
+
+    /**
+     * How many extra replicas are present for this container and need to be
+     * removed. For EC, this number indicates how many indexes have more than
+     * one copy. Eg index 1 could have 3 copies. Index 2 could have 2 copies.
+     * The rest have 1 copy. The value returned here will be 2, indicating 2
+     * indexes have excess redundancy, even though we have to remove 3 replicas.
+     * For Ratis containers, the number return is simply the current replica
+     * count minus the expected replica count.
+     * @return The number of excess replicas.
+     */
+    public int getExcessRedundancy() {
+      return excessRedundancy;
+    }
+
+    /**
+     * Considering the pending replicas, which have been scheduled for delete,
+     * will the container still be over-replicated when they complete.
+     * @return True if the over-replication is corrected by the pending
+     *         deletes. False otherwise.
+     */

Review Comment:
   yes
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3512: HDDS-6697. EC: ReplicationManager - create class to detect EC container health issues

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3512:
URL: https://github.com/apache/ozone/pull/3512#discussion_r896841352


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ContainerHealthResult.java:
##########
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p/>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p/>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.hdds.scm.container.replication;
+
+import org.apache.hadoop.hdds.scm.container.ContainerInfo;
+import org.apache.hadoop.ozone.protocol.commands.SCMCommand;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Class used to represent the Health States of containers.
+ */
+public class ContainerHealthResult {
+
+  /**
+   * All possible container health states.
+   */
+  public enum HealthState {
+    HEALTHY,
+    UNDER_REPLICATED,
+    OVER_REPLICATED
+  }
+
+  private final ContainerInfo containerInfo;
+  private final HealthState healthState;
+  private final List<SCMCommand> commands = new ArrayList<>();
+
+  ContainerHealthResult(ContainerInfo containerInfo, HealthState healthState) {
+    this.containerInfo = containerInfo;
+    this.healthState = healthState;
+  }
+
+  public HealthState getHealthState() {
+    return healthState;
+  }
+
+  public void addCommand(SCMCommand command) {
+    commands.add(command);
+  }
+
+  public List<SCMCommand> getCommands() {
+    return commands;
+  }
+
+  public ContainerInfo getContainerInfo() {
+    return containerInfo;
+  }
+
+  /**
+   * Class for Healthy Container Check results.
+   */
+  public static class HealthyResult extends ContainerHealthResult {
+
+    HealthyResult(ContainerInfo containerInfo) {
+      super(containerInfo, HealthState.HEALTHY);
+    }
+  }
+
+  /**
+   * Class for UnderReplicated Container Check Results.
+   */
+  public static class UnderReplicatedHealthResult
+      extends ContainerHealthResult {
+
+    private final int remainingRedundancy;
+    private final boolean dueToDecommission;
+    private final boolean sufficientlyReplicatedAfterPending;
+    private final boolean unrecoverable;
+
+    UnderReplicatedHealthResult(ContainerInfo containerInfo,
+        int remainingRedundancy, boolean dueToDecommission,
+        boolean replicatedOkWithPending, boolean unrecoverable) {
+      super(containerInfo, HealthState.UNDER_REPLICATED);
+      this.remainingRedundancy = remainingRedundancy;
+      this.dueToDecommission = dueToDecommission;
+      this.sufficientlyReplicatedAfterPending = replicatedOkWithPending;
+      this.unrecoverable = unrecoverable;
+    }
+
+    /**
+     * How many more replicas can be lost before the the container is
+     * unreadable. For containers which are under-replicated due to decommission
+     * or maintenance only, the remaining redundancy will include those
+     * decommissioning or maintenance replicas, as they are technically still
+     * available until the datanode processes are stopped.
+     * @return Count of remaining redundant replicas.
+     */
+    public int getRemainingRedundancy() {
+      return remainingRedundancy;
+    }
+
+    /**
+     * Indicates whether the under-replication is caused only by replicas
+     * being decommissioned or entering maintenance. Ie, there are not replicas
+     * unavailable.
+     * @return True is the under-replication is caused by decommission.
+     */
+    public boolean underReplicatedDueToDecommission() {
+      return dueToDecommission;
+    }
+
+    /**
+     * Considering the pending replicas, which have been scheduled for copy or
+     * reconstruction, will the container still be under-replicated when they
+     * complete.
+     * @return True if the under-replication is corrected by the pending
+     *         replicas. False otherwise.
+     */
+    public boolean isSufficientlyReplicatedAfterPending() {
+      return sufficientlyReplicatedAfterPending;
+    }
+
+    /**
+     * Indicates whether a container has enough replicas to be read. For Ratis
+     * at least one replia must be available. For EC, at least dataNum replicas
+     * are needed.
+     * @return True if the container has insufficient replicas available to be
+     *         read, false otherwise
+     */
+    public boolean isUnrecoverable() {
+      return unrecoverable;
+    }
+  }
+
+  /**
+   * Class for Over Replicated Container Health Results.
+   */
+  public static class OverReplicatedHealthResult extends ContainerHealthResult {
+
+    private final int excessRedundancy;
+    private final boolean sufficientReplicatedAfterPending;
+
+
+    OverReplicatedHealthResult(ContainerInfo containerInfo,
+        int excessRedundancy, boolean replicatedOkWithPending) {
+      super(containerInfo, HealthState.OVER_REPLICATED);
+      this.excessRedundancy = excessRedundancy;
+      this.sufficientReplicatedAfterPending = replicatedOkWithPending;
+    }
+
+    /**
+     * How many extra replicas are present for this container and need to be
+     * removed. For EC, this number indicates how many indexes have more than
+     * one copy. Eg index 1 could have 3 copies. Index 2 could have 2 copies.
+     * The rest have 1 copy. The value returned here will be 2, indicating 2
+     * indexes have excess redundancy, even though we have to remove 3 replicas.
+     * For Ratis containers, the number return is simply the current replica
+     * count minus the expected replica count.
+     * @return The number of excess replicas.
+     */
+    public int getExcessRedundancy() {
+      return excessRedundancy;
+    }
+
+    /**
+     * Considering the pending replicas, which have been scheduled for delete,
+     * will the container still be over-replicated when they complete.
+     * @return True if the over-replication is corrected by the pending
+     *         deletes. False otherwise.
+     */

Review Comment:
   I see I have a typo in there - is that what you are referring to? sufficient -> sufficiently ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao merged pull request #3512: HDDS-6697. EC: ReplicationManager - create class to detect EC container health issues

Posted by GitBox <gi...@apache.org>.
umamaheswararao merged PR #3512:
URL: https://github.com/apache/ozone/pull/3512


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on a diff in pull request #3512: HDDS-6697. EC: ReplicationManager - create class to detect EC container health issues

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on code in PR #3512:
URL: https://github.com/apache/ozone/pull/3512#discussion_r896823379


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ContainerHealthResult.java:
##########
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p/>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p/>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.hdds.scm.container.replication;
+
+import org.apache.hadoop.hdds.scm.container.ContainerInfo;
+import org.apache.hadoop.ozone.protocol.commands.SCMCommand;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Class used to represent the Health States of containers.
+ */
+public class ContainerHealthResult {
+
+  /**
+   * All possible container health states.
+   */
+  public enum HealthState {
+    HEALTHY,
+    UNDER_REPLICATED,
+    OVER_REPLICATED
+  }
+
+  private final ContainerInfo containerInfo;
+  private final HealthState healthState;
+  private final List<SCMCommand> commands = new ArrayList<>();
+
+  ContainerHealthResult(ContainerInfo containerInfo, HealthState healthState) {
+    this.containerInfo = containerInfo;
+    this.healthState = healthState;
+  }
+
+  public HealthState getHealthState() {
+    return healthState;
+  }
+
+  public void addCommand(SCMCommand command) {
+    commands.add(command);
+  }
+
+  public List<SCMCommand> getCommands() {
+    return commands;
+  }
+
+  public ContainerInfo getContainerInfo() {
+    return containerInfo;
+  }
+
+  /**
+   * Class for Healthy Container Check results.
+   */
+  public static class HealthyResult extends ContainerHealthResult {
+
+    HealthyResult(ContainerInfo containerInfo) {
+      super(containerInfo, HealthState.HEALTHY);
+    }
+  }
+
+  /**
+   * Class for UnderReplicated Container Check Results.
+   */
+  public static class UnderReplicatedHealthResult
+      extends ContainerHealthResult {
+
+    private final int remainingRedundancy;
+    private final boolean dueToDecommission;
+    private final boolean sufficientlyReplicatedAfterPending;
+    private final boolean unrecoverable;
+
+    UnderReplicatedHealthResult(ContainerInfo containerInfo,
+        int remainingRedundancy, boolean dueToDecommission,
+        boolean replicatedOkWithPending, boolean unrecoverable) {
+      super(containerInfo, HealthState.UNDER_REPLICATED);
+      this.remainingRedundancy = remainingRedundancy;
+      this.dueToDecommission = dueToDecommission;
+      this.sufficientlyReplicatedAfterPending = replicatedOkWithPending;
+      this.unrecoverable = unrecoverable;
+    }
+
+    /**
+     * How many more replicas can be lost before the the container is
+     * unreadable. For containers which are under-replicated due to decommission
+     * or maintenance only, the remaining redundancy will include those
+     * decommissioning or maintenance replicas, as they are technically still
+     * available until the datanode processes are stopped.
+     * @return Count of remaining redundant replicas.
+     */
+    public int getRemainingRedundancy() {
+      return remainingRedundancy;
+    }
+
+    /**
+     * Indicates whether the under-replication is caused only by replicas
+     * being decommissioned or entering maintenance. Ie, there are not replicas
+     * unavailable.
+     * @return True is the under-replication is caused by decommission.
+     */
+    public boolean underReplicatedDueToDecommission() {
+      return dueToDecommission;
+    }
+
+    /**
+     * Considering the pending replicas, which have been scheduled for copy or
+     * reconstruction, will the container still be under-replicated when they
+     * complete.
+     * @return True if the under-replication is corrected by the pending
+     *         replicas. False otherwise.
+     */
+    public boolean isSufficientlyReplicatedAfterPending() {
+      return sufficientlyReplicatedAfterPending;
+    }
+
+    /**
+     * Indicates whether a container has enough replicas to be read. For Ratis
+     * at least one replia must be available. For EC, at least dataNum replicas
+     * are needed.
+     * @return True if the container has insufficient replicas available to be
+     *         read, false otherwise
+     */
+    public boolean isUnrecoverable() {
+      return unrecoverable;
+    }
+  }
+
+  /**
+   * Class for Over Replicated Container Health Results.
+   */
+  public static class OverReplicatedHealthResult extends ContainerHealthResult {
+
+    private final int excessRedundancy;
+    private final boolean sufficientReplicatedAfterPending;
+
+
+    OverReplicatedHealthResult(ContainerInfo containerInfo,
+        int excessRedundancy, boolean replicatedOkWithPending) {
+      super(containerInfo, HealthState.OVER_REPLICATED);
+      this.excessRedundancy = excessRedundancy;
+      this.sufficientReplicatedAfterPending = replicatedOkWithPending;
+    }
+
+    /**
+     * How many extra replicas are present for this container and need to be
+     * removed. For EC, this number indicates how many indexes have more than
+     * one copy. Eg index 1 could have 3 copies. Index 2 could have 2 copies.
+     * The rest have 1 copy. The value returned here will be 2, indicating 2
+     * indexes have excess redundancy, even though we have to remove 3 replicas.
+     * For Ratis containers, the number return is simply the current replica
+     * count minus the expected replica count.
+     * @return The number of excess replicas.
+     */
+    public int getExcessRedundancy() {
+      return excessRedundancy;
+    }
+
+    /**
+     * Considering the pending replicas, which have been scheduled for delete,
+     * will the container still be over-replicated when they complete.
+     * @return True if the over-replication is corrected by the pending
+     *         deletes. False otherwise.
+     */

Review Comment:
   For consistency, you may want to rename this to isSufficientlyReplicatedAfterPending?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on pull request #3512: HDDS-6697. EC: ReplicationManager - create class to detect EC container health issues

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on PR #3512:
URL: https://github.com/apache/ozone/pull/3512#issuecomment-1155307850

   @sodonnel overall patch looks good to me. After changing the method name type, I am +1 on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org