You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/09/15 02:38:27 UTC

[GitHub] [hadoop-ozone] maobaolong opened a new pull request #1423: HDDS-4244. Container deleted wrong replica cause mis-replicated.

maobaolong opened a new pull request #1423:
URL: https://github.com/apache/hadoop-ozone/pull/1423


   ## What changes were proposed in this pull request?
   
   Container deleted wrong replica cause mis-replicated.
   
   ## What is the link to the Apache JIRA
   
   HDDS-4244
   
   ## How was this patch tested?
   
   
   - Related config file
     - ozone-site.xml
   ```xml
     <property>
       <name>ozone.scm.container.placement.impl</name>
       <value>org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware</value>
     </property>
     <property>
       <name>net.topology.node.switch.mapping.impl</name>
       <value>org.apache.hadoop.net.TableMapping</value>
     </property>
     <property>
       <name>net.topology.table.file.name</name>
       <value>/data/ozoneadmin/ozoneenv/ozone/etc/hadoop/network-config</value>
     </property>
   ```
   
     - network-config
   ```
   192.168.1.100 /racks1
   192.168.1.101 /racks1
   192.168.1.102 /racks1
   192.168.1.103 /racks2
   192.168.1.104 /racks2
   192.168.1.106 /racks2
   ```
   
   First you should make the 3 replicas of the tested container on the same racks, like racks1.
   secondly, change the config file update the request config.
   
   2 ReplicationManager interval, you can use the following command to verify the container replicas are in the 2 racks.
   ```bash
   ozone admin container info #xxx
   ```
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] nandakumar131 commented on pull request #1423: HDDS-4244. Container deleted wrong replica cause mis-replicated.

Posted by GitBox <gi...@apache.org>.
nandakumar131 commented on pull request #1423:
URL: https://github.com/apache/hadoop-ozone/pull/1423#issuecomment-696007538


   How exactly does the new test cases verify that the ReplicationManager is considering Rack Awareness and picks the correct replica for deletion? As far as I can understand, the new test cases are checking if the delete command is sent or not. The test cases don't verify any kind of Rack Placement. Am I missing something here?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi merged pull request #1423: HDDS-4244. Container deleted wrong replica cause mis-replicated.

Posted by GitBox <gi...@apache.org>.
ChenSammi merged pull request #1423:
URL: https://github.com/apache/hadoop-ozone/pull/1423


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] maobaolong commented on pull request #1423: HDDS-4244. Container deleted wrong replica cause mis-replicated.

Posted by GitBox <gi...@apache.org>.
maobaolong commented on pull request #1423:
URL: https://github.com/apache/hadoop-ozone/pull/1423#issuecomment-693153292


   @ChenSammi Thank you for remind me, I add a `testOverReplicatedAndPolicyUnSatisfied` test case. PTAL.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1423: HDDS-4244. Container deleted wrong replica cause mis-replicated.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #1423:
URL: https://github.com/apache/hadoop-ozone/pull/1423#issuecomment-693136912


   @maobaolong , can we add a policy unsatisfied UT too? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1423: HDDS-4244. Container deleted wrong replica cause mis-replicated.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #1423:
URL: https://github.com/apache/hadoop-ozone/pull/1423#issuecomment-693302200


   LGTM +1. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] nandakumar131 commented on a change in pull request #1423: HDDS-4244. Container deleted wrong replica cause mis-replicated.

Posted by GitBox <gi...@apache.org>.
nandakumar131 commented on a change in pull request #1423:
URL: https://github.com/apache/hadoop-ozone/pull/1423#discussion_r491900009



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -660,21 +660,23 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
       if (excess > 0) {
         eligibleReplicas.removeAll(unhealthyReplicas);
         Set<ContainerReplica> replicaSet = new HashSet<>(eligibleReplicas);
-        boolean misReplicated =
-            getPlacementStatus(replicaSet, replicationFactor)
-                .isPolicySatisfied();
+        ContainerPlacementStatus ps =
+            getPlacementStatus(replicaSet, replicationFactor);
         for (ContainerReplica r : eligibleReplicas) {
           if (excess <= 0) {
             break;
           }
           // First remove the replica we are working on from the set, and then
           // check if the set is now mis-replicated.
           replicaSet.remove(r);
-          boolean nowMisRep = getPlacementStatus(replicaSet, replicationFactor)
-              .isPolicySatisfied();
-          if (misReplicated || !nowMisRep) {
-            // Remove the replica if the container was already mis-replicated
-            // OR if losing this replica does not make it become mis-replicated
+          ContainerPlacementStatus nowPS =
+              getPlacementStatus(replicaSet, replicationFactor);
+          if ((!ps.isPolicySatisfied()
+                && nowPS.actualPlacementCount() == ps.actualPlacementCount())
+              || (ps.isPolicySatisfied() && nowPS.isPolicySatisfied())) {

Review comment:
       Why do we need such a complex condition check?
   Won't fixing the previous check solve the problem?
   ```if (misReplicated || !nowMisRep)``` to ```if (!misReplicated || nowMisRep)```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] nandakumar131 commented on a change in pull request #1423: HDDS-4244. Container deleted wrong replica cause mis-replicated.

Posted by GitBox <gi...@apache.org>.
nandakumar131 commented on a change in pull request #1423:
URL: https://github.com/apache/hadoop-ozone/pull/1423#discussion_r491900009



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -660,21 +660,23 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
       if (excess > 0) {
         eligibleReplicas.removeAll(unhealthyReplicas);
         Set<ContainerReplica> replicaSet = new HashSet<>(eligibleReplicas);
-        boolean misReplicated =
-            getPlacementStatus(replicaSet, replicationFactor)
-                .isPolicySatisfied();
+        ContainerPlacementStatus ps =
+            getPlacementStatus(replicaSet, replicationFactor);
         for (ContainerReplica r : eligibleReplicas) {
           if (excess <= 0) {
             break;
           }
           // First remove the replica we are working on from the set, and then
           // check if the set is now mis-replicated.
           replicaSet.remove(r);
-          boolean nowMisRep = getPlacementStatus(replicaSet, replicationFactor)
-              .isPolicySatisfied();
-          if (misReplicated || !nowMisRep) {
-            // Remove the replica if the container was already mis-replicated
-            // OR if losing this replica does not make it become mis-replicated
+          ContainerPlacementStatus nowPS =
+              getPlacementStatus(replicaSet, replicationFactor);
+          if ((!ps.isPolicySatisfied()
+                && nowPS.actualPlacementCount() == ps.actualPlacementCount())
+              || (ps.isPolicySatisfied() && nowPS.isPolicySatisfied())) {

Review comment:
       Why do we need such a complex condition check?
   Won't fixing the previous check solve the problem?
   ```if (misReplicated || !nowMisRep)``` to ```if (!misReplicated || nowMisRep)```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] nandakumar131 commented on pull request #1423: HDDS-4244. Container deleted wrong replica cause mis-replicated.

Posted by GitBox <gi...@apache.org>.
nandakumar131 commented on pull request #1423:
URL: https://github.com/apache/hadoop-ozone/pull/1423#issuecomment-696007538


   How exactly does the new test cases verify that the ReplicationManager is considering Rack Awareness and picks the correct replica for deletion? As far as I can understand, the new test cases are checking if the delete command is sent or not. The test cases don't verify any kind of Rack Placement. Am I missing something here?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org