You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/08/26 10:54:59 UTC

[GitHub] [ozone] sodonnel opened a new pull request, #3723: HDDS-6975. EC: Define the value of Maintenance Redundancy for EC containers

sodonnel opened a new pull request, #3723:
URL: https://github.com/apache/ozone/pull/3723

   ## What changes were proposed in this pull request?
   
   For Ratis, the number of replicas which must be available when a node goes into maintenance is a simple integer defaulting to 2 in hdds.scm.replication.maintenance.replica.minimum.
   
   This means that for a Ratis container, one out of the 3 nodes can be offline without any replication happening. This can be set to 1, letting two go offline or 3 ensuring full redundancy and hence replication when any node is taken offline.
   
   It could be argued that 1 would be a better default here. With the default placement of 2 replicas on one rack and 1 on another rack, that should allow for a full rack to be taken offline without replication.
   
   For EC, its a little more tricky. Aside from Ratis 1 containers, which are rarely used in practice, EC can tolerate 2 offline (for 3-2), 3 (for 6-3) or 4 (for 10-4).
   
   If we use the same default of 2, that means replication will always be required for 3-2 containers. Also the "number of replicas online" doesn't make as much sense for EC, as each replica is not identical.
   
   EC is also slightly more tricky - when any of the data copies are offline, online reconctruction must be used to read the data, causing a performance penalty, but that cannot be avoided.
   
   If we take the Ratis default of 2 - when there are two replicas out of 3 online, then we have a remaining redundancy of 1 - ie we can afford to lose one more copy and still read data.
   
   If we change the Ratis setting to 1, there is a remaining redundancy of 0, because the loss of another replica renders the data unreadable.
   
   For EC, if we default the setting to a "remaining redundancy" of 1, this would mean we can tolerate a loss of 1 more replicas and still read the data.
   
   This would allow for 3-2 to have 1 replica offline, 6-3 could have 2 and 10-4 could have 3 without any replicaion. In all cases the data redundancy is the same as with Ratis having 2 containers offline.
   
   Additionally, its highly likely online recovery will be needed to read the data, eg if 1 container is offline in 10-4 there is a 10 in 14 (5 in 7) chance its a data container, so trying to keep more containers online for larger EC groups is probably not going to help performance much.
   
   In a large cluster, ideally EC containers will be spread across racks such that there is only 1 replia per rack, so taking a full rack offline would only reduce the redundancy by 1 meaning even 3-2 containers could tolerate a rack going into maintenance.
   
   In summary, I believe the simplest solution, is to have an EC setting hdds.scm.replication.maintenance.ec.remaining.redundancy = 1 which we use for maintenance of EC containers and is basically equivalent to the Ratis default of 2. It may make sense to call the new parameter hdds.scm.replication.maintenance.remaining.redundancy and use the same value for both Ratis and EC, deprecating the old value.
   
   For now in this change I have added "hdds.scm.replication.maintenance.remaining.redundancy" and noted in the comments / docs this is for EC only. We should consider how to deprecate the old parameter and bring the two together in another Jira. I am reluctant to call this one `hdds.scm.replication.ec.maintenance.remaining.redundancy` as then we will have to deprecate two parameters in the future.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-6975
   
   ## How was this patch tested?
   
   Existing tests cover the maintenance counts in the new RM related classes. I also modified the decommission and maintenance tests to include some EC data, and hence fully test the decommission and maintenance flows with EC data in place.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on a diff in pull request #3723: HDDS-6975. EC: Define the value of Maintenance Redundancy for EC containers

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on code in PR #3723:
URL: https://github.com/apache/ozone/pull/3723#discussion_r959726400


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java:
##########
@@ -686,6 +688,38 @@ public void setMaintenanceReplicaMinimum(int replicaCount) {
       this.maintenanceReplicaMinimum = replicaCount;
     }
 
+    /**
+     * Defines how many redundant replicas of a container must be online for a
+     * node to enter maintenance. Currently, only used for EC containers. We
+     * need to consider removing the "maintenance.replica.minimum" setting
+     * and having both Ratis and EC use this new one.
+     */
+    @Config(key = "maintenance.remaining.redundancy",
+        type = ConfigType.INT,
+        defaultValue = "1",
+        tags = {SCM, OZONE},
+        description = "The number of redundant containers in a group which" +
+            " must be available for a node to enter maintenance. If putting" +
+            " a node into maintenance reduces the redundancy below this value" +
+            " , the node will remain in the entering maintenance state until" +
+            " a new replica is created. For Ratis containers, the default" +
+            " value of 1 ensures at least two replicas are online, meaning 1" +
+            " more can be lost without data becoming unavailable. For any EC" +
+            " container it will have at least dataNum + 1 online, allowing" +
+            " the loss of 1 more replica before data becomes unavailable." +

Review Comment:
   > ozone-default.xml at build time, thats one of the main reasons for having it.
   
   Ok that is good then. 
   
   >The point of maintenance mode is to take nodes offline quickly for a short time and avoid the wait and overhead of replication. Of course in any user guide, it should be made clear that maintenance for EC can result in degraded reads. I'm not sure we have such a user guide anywhere to add to!
   
   May be for now, explicitly mentioning in conf description may be ok. So, that who ever exploring these configs will get attention of the trade offs. Thanks
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on a diff in pull request #3723: HDDS-6975. EC: Define the value of Maintenance Redundancy for EC containers

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on code in PR #3723:
URL: https://github.com/apache/ozone/pull/3723#discussion_r959617540


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java:
##########
@@ -686,6 +688,38 @@ public void setMaintenanceReplicaMinimum(int replicaCount) {
       this.maintenanceReplicaMinimum = replicaCount;
     }
 
+    /**
+     * Defines how many redundant replicas of a container must be online for a
+     * node to enter maintenance. Currently, only used for EC containers. We
+     * need to consider removing the "maintenance.replica.minimum" setting
+     * and having both Ratis and EC use this new one.
+     */
+    @Config(key = "maintenance.remaining.redundancy",
+        type = ConfigType.INT,
+        defaultValue = "1",
+        tags = {SCM, OZONE},
+        description = "The number of redundant containers in a group which" +
+            " must be available for a node to enter maintenance. If putting" +
+            " a node into maintenance reduces the redundancy below this value" +
+            " , the node will remain in the entering maintenance state until" +
+            " a new replica is created. For Ratis containers, the default" +
+            " value of 1 ensures at least two replicas are online, meaning 1" +
+            " more can be lost without data becoming unavailable. For any EC" +
+            " container it will have at least dataNum + 1 online, allowing" +
+            " the loss of 1 more replica before data becomes unavailable." +

Review Comment:
   But for EC, users should aware that, online reconstruction happens, and hence they may see impact in performance. For no performance impact, the very positive case would be [replicaIndex until datanum + 1]. Things will be complicated. I think we must make users aware of the impact when putting nodes in maintenance. 
   
   No plans to add this config in ozone-defaults.xml ?
   Do you know this descriptions will be available as part of admin guides or so? I am not sure that happens, then they will be just dev descriptions.



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java:
##########
@@ -686,6 +688,38 @@ public void setMaintenanceReplicaMinimum(int replicaCount) {
       this.maintenanceReplicaMinimum = replicaCount;
     }
 
+    /**
+     * Defines how many redundant replicas of a container must be online for a
+     * node to enter maintenance. Currently, only used for EC containers. We
+     * need to consider removing the "maintenance.replica.minimum" setting
+     * and having both Ratis and EC use this new one.
+     */
+    @Config(key = "maintenance.remaining.redundancy",
+        type = ConfigType.INT,
+        defaultValue = "1",
+        tags = {SCM, OZONE},
+        description = "The number of redundant containers in a group which" +
+            " must be available for a node to enter maintenance. If putting" +
+            " a node into maintenance reduces the redundancy below this value" +

Review Comment:
   Nit: entering maintenance -> ENTERING_MAINTENANCE
   Just suggesting to use exact state than words. 



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java:
##########
@@ -686,6 +688,38 @@ public void setMaintenanceReplicaMinimum(int replicaCount) {
       this.maintenanceReplicaMinimum = replicaCount;
     }
 
+    /**
+     * Defines how many redundant replicas of a container must be online for a
+     * node to enter maintenance. Currently, only used for EC containers. We
+     * need to consider removing the "maintenance.replica.minimum" setting

Review Comment:
   I would suggest to file a JIRA and give that JIRA ref here for removal of maintenanc.replica.minimum



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3723: HDDS-6975. EC: Define the value of Maintenance Redundancy for EC containers

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3723:
URL: https://github.com/apache/ozone/pull/3723#discussion_r959757379


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java:
##########
@@ -686,6 +688,38 @@ public void setMaintenanceReplicaMinimum(int replicaCount) {
       this.maintenanceReplicaMinimum = replicaCount;
     }
 
+    /**
+     * Defines how many redundant replicas of a container must be online for a
+     * node to enter maintenance. Currently, only used for EC containers. We
+     * need to consider removing the "maintenance.replica.minimum" setting
+     * and having both Ratis and EC use this new one.
+     */
+    @Config(key = "maintenance.remaining.redundancy",
+        type = ConfigType.INT,
+        defaultValue = "1",
+        tags = {SCM, OZONE},
+        description = "The number of redundant containers in a group which" +
+            " must be available for a node to enter maintenance. If putting" +
+            " a node into maintenance reduces the redundancy below this value" +
+            " , the node will remain in the entering maintenance state until" +
+            " a new replica is created. For Ratis containers, the default" +
+            " value of 1 ensures at least two replicas are online, meaning 1" +
+            " more can be lost without data becoming unavailable. For any EC" +
+            " container it will have at least dataNum + 1 online, allowing" +
+            " the loss of 1 more replica before data becomes unavailable." +

Review Comment:
   OK - I added a line to the documentation highlighting this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] siddhantsangwan commented on a diff in pull request #3723: HDDS-6975. EC: Define the value of Maintenance Redundancy for EC containers

Posted by GitBox <gi...@apache.org>.
siddhantsangwan commented on code in PR #3723:
URL: https://github.com/apache/ozone/pull/3723#discussion_r957067384


##########
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/scm/node/TestDecommissionAndMaintenance.java:
##########
@@ -385,8 +426,32 @@ public void testContainerIsReplicatedWhenAllNodesGotoMaintenance()
     for (DatanodeDetails dn : forMaintenance) {
       waitForDnToReachOpState(dn, IN_SERVICE);
     }
-
     waitForContainerReplicas(container, 3);
+
+    // Now write some EC data and put two nodes into maintenance. This should
+    // result in at least 1 extra replica getting created.
+    generateData(20, "eckey", ecRepConfig);
+    final ContainerInfo ecContainer =
+        waitForAndReturnContainer(ecRepConfig, 5);
+    List<DatanodeDetails> ecMaintenance = replicas.stream()
+        .map(ContainerReplica::getDatanodeDetails)
+        .limit(2)
+        .collect(Collectors.toList());
+    scmClient.startMaintenanceNodes(ecMaintenance.stream()
+        .map(this::getDNHostAndPort)
+        .collect(Collectors.toList()), 0);
+    for (DatanodeDetails dn : ecMaintenance) {
+      waitForDnToReachPersistedOpState(dn, IN_MAINTENANCE);
+    }
+    assertTrue(cm.getContainerReplicas(ecContainer.containerID()).size() >= 6);
+    scmClient.recommissionNodes(forMaintenance.stream()
+        .map(this::getDNHostAndPort)
+        .collect(Collectors.toList()));
+    // Ensure the 2 DNs go to maintenance

Review Comment:
   "maintenance" should be "in-service"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kaijchen commented on pull request #3723: HDDS-6975. EC: Define the value of Maintenance Redundancy for EC containers

Posted by GitBox <gi...@apache.org>.
kaijchen commented on PR #3723:
URL: https://github.com/apache/ozone/pull/3723#issuecomment-1235129163

   Hi @sodonnel, seems `testContainerIsReplicatedWhenAllNodesGotoMaintenance` is constantly failing.
   Please take a look.
   
   https://github.com/apache/ozone/runs/8122514775?check_suite_focus=true#step:6:3092


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel merged pull request #3723: HDDS-6975. EC: Define the value of Maintenance Redundancy for EC containers

Posted by GitBox <gi...@apache.org>.
sodonnel merged PR #3723:
URL: https://github.com/apache/ozone/pull/3723


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3723: HDDS-6975. EC: Define the value of Maintenance Redundancy for EC containers

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3723:
URL: https://github.com/apache/ozone/pull/3723#discussion_r959647964


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java:
##########
@@ -686,6 +688,38 @@ public void setMaintenanceReplicaMinimum(int replicaCount) {
       this.maintenanceReplicaMinimum = replicaCount;
     }
 
+    /**
+     * Defines how many redundant replicas of a container must be online for a
+     * node to enter maintenance. Currently, only used for EC containers. We
+     * need to consider removing the "maintenance.replica.minimum" setting
+     * and having both Ratis and EC use this new one.
+     */
+    @Config(key = "maintenance.remaining.redundancy",
+        type = ConfigType.INT,
+        defaultValue = "1",
+        tags = {SCM, OZONE},
+        description = "The number of redundant containers in a group which" +
+            " must be available for a node to enter maintenance. If putting" +
+            " a node into maintenance reduces the redundancy below this value" +
+            " , the node will remain in the entering maintenance state until" +
+            " a new replica is created. For Ratis containers, the default" +
+            " value of 1 ensures at least two replicas are online, meaning 1" +
+            " more can be lost without data becoming unavailable. For any EC" +
+            " container it will have at least dataNum + 1 online, allowing" +
+            " the loss of 1 more replica before data becomes unavailable." +

Review Comment:
   The config framework automatically puts these descriptions into the generated ozone-default.xml at build time, thats one of the main reasons for having it.
   
   The point of maintenance mode is to take nodes offline quickly for a short time and avoid the wait and overhead of replication. Of course in any user guide, it should be made clear that maintenance for EC can result in degraded reads. I'm not sure we have such a user guide anywhere to add to!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3723: HDDS-6975. EC: Define the value of Maintenance Redundancy for EC containers

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3723:
URL: https://github.com/apache/ozone/pull/3723#discussion_r959655380


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java:
##########
@@ -686,6 +688,38 @@ public void setMaintenanceReplicaMinimum(int replicaCount) {
       this.maintenanceReplicaMinimum = replicaCount;
     }
 
+    /**
+     * Defines how many redundant replicas of a container must be online for a
+     * node to enter maintenance. Currently, only used for EC containers. We
+     * need to consider removing the "maintenance.replica.minimum" setting

Review Comment:
   Raised HDDS-7190 to take this forward.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] siddhantsangwan commented on a diff in pull request #3723: HDDS-6975. EC: Define the value of Maintenance Redundancy for EC containers

Posted by GitBox <gi...@apache.org>.
siddhantsangwan commented on code in PR #3723:
URL: https://github.com/apache/ozone/pull/3723#discussion_r957067384


##########
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/scm/node/TestDecommissionAndMaintenance.java:
##########
@@ -385,8 +426,32 @@ public void testContainerIsReplicatedWhenAllNodesGotoMaintenance()
     for (DatanodeDetails dn : forMaintenance) {
       waitForDnToReachOpState(dn, IN_SERVICE);
     }
-
     waitForContainerReplicas(container, 3);
+
+    // Now write some EC data and put two nodes into maintenance. This should
+    // result in at least 1 extra replica getting created.
+    generateData(20, "eckey", ecRepConfig);
+    final ContainerInfo ecContainer =
+        waitForAndReturnContainer(ecRepConfig, 5);
+    List<DatanodeDetails> ecMaintenance = replicas.stream()
+        .map(ContainerReplica::getDatanodeDetails)
+        .limit(2)
+        .collect(Collectors.toList());
+    scmClient.startMaintenanceNodes(ecMaintenance.stream()
+        .map(this::getDNHostAndPort)
+        .collect(Collectors.toList()), 0);
+    for (DatanodeDetails dn : ecMaintenance) {
+      waitForDnToReachPersistedOpState(dn, IN_MAINTENANCE);
+    }
+    assertTrue(cm.getContainerReplicas(ecContainer.containerID()).size() >= 6);
+    scmClient.recommissionNodes(forMaintenance.stream()
+        .map(this::getDNHostAndPort)
+        .collect(Collectors.toList()));
+    // Ensure the 2 DNs go to maintenance

Review Comment:
   "maintenance" should be "in-service" in the comment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #3723: HDDS-6975. EC: Define the value of Maintenance Redundancy for EC containers

Posted by GitBox <gi...@apache.org>.
sodonnel commented on PR #3723:
URL: https://github.com/apache/ozone/pull/3723#issuecomment-1233016225

   @umamaheswararao I made the suggested change and created the Jira. Please take another look and let me know if it looks good. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #3723: HDDS-6975. EC: Define the value of Maintenance Redundancy for EC containers

Posted by GitBox <gi...@apache.org>.
sodonnel commented on PR #3723:
URL: https://github.com/apache/ozone/pull/3723#issuecomment-1237986792

   @kaijchen Thanks for pointing this out. The test seems to be passing and failing sometimes. I think I have found the problem so I will file a PR to fix it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org