You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/12/15 04:25:02 UTC

[GitHub] [ozone] GlenGeng opened a new pull request #1700: HDDS-4589: Handle potential data loss during ReplicationManager.handleOverReplicatedContainer()

GlenGeng opened a new pull request #1700:
URL: https://github.com/apache/ozone/pull/1700

## What changes were proposed in this pull request?

**Problem:**
ReplicationManager maintains the in-flight replication and deletion in-memory, which is not replicated using Ratis. So, theoretically it’s possible that we might run into issues if we immediately start ReplicationManager after a failover.
Scenario: There are 6 replicas of the container C1 namely CR1, CR2, CR3, CR4, CR5 and CR6. The container is over replicated, so the current SCM S1 decides to delete the excess replicas. SCM S1 picks CR1, CR2 and CR3 for deletion, this information is updated in the in-flight deletion list and deletion commands are sent to the datanodes. If there is a failover at this point and SCM S2 becomes leader, it doesn’t have the in-flight deletion list from SCM S1 and it finds the container C1 to be over replicated. Theoretically it’s possible that SCM S2 picks CR4, CR5 and CR6 for deletion. If this happens, we will end up in data loss.
To address this issue we will make the logic to select a replica for deletion deterministic. This will make sure that the new leader after failover will pick the same replica for deletion which was picked by the old leader.

**Approach:**
Sort the candidate replicas. Delete excess replicas from small to large. There will not be any DeleteContainerCommand for the largest 3 Replica sent by any SCM.

**Example:**
Assume there are 6 replicas of the container C1, the factor of C1 is 3, the names of the replicas are CR1, CR2, CR3, CR4, CR5 and CR6. There will not be any DeleteContainerCommand for CR4, CR5, CR6 sent by any SCM:
If SCM sees less than or equal to 3 replicas, it won’t send any DeleteContainerCommand.
If SCM sees 4 replicas, It deletes the smallest replica, which means it won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a contradiction.
If SCM sees 5 replicas, It deletes the smallest 2 replicas, which means it won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a contradiction.
If SCM sees 6 replicas, It deletes the smallest 3 replicas, which means it won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a contradiction.

**P.S.:**
Since this issue exists in master as well, e.g., a quickly restart of SCM, we decide to fix this problem in master.

## What is the link to the Apache JIRA

(Please create an issue in ASF JIRA before opening a pull request,
and you need to set the title of the pull request which starts with
the corresponding JIRA issue number. (e.g. HDDS-XXXX. Fix a typo in YYY.)

Please replace this section with the link to the Apache JIRA)

## How was this patch tested?

(Please explain how this patch was tested. Ex: unit tests, manual tests)
(If this patch involves UI changes, please attach a screen-shot; otherwise, remove this)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] nandakumar131 merged pull request #1700: HDDS-4589: Handle potential data loss during ReplicationManager.handleOverReplicatedContainer()

Posted by GitBox <gi...@apache.org>.

nandakumar131 merged pull request #1700:
URL: https://github.com/apache/ozone/pull/1700


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] amaliujia commented on a change in pull request #1700: HDDS-4589: Handle potential data loss during ReplicationManager.handleOverReplicatedContainer()

Posted by GitBox <gi...@apache.org>.

amaliujia commented on a change in pull request #1700:
URL: https://github.com/apache/ozone/pull/1700#discussion_r543663504



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -720,6 +721,12 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
 
       final List<ContainerReplica> eligibleReplicas = new ArrayList<>(replicas);
 
+      // Iterate replicas in deterministic order to avoid potential data loss.
+      // See https://issues.apache.org/jira/browse/HDDS-4589.
+      // N.B., sort replicas by (containerID, datanodeDetails).
+      eligibleReplicas.sort(
+          Comparator.comparingLong(ContainerReplica::hashCode));

Review comment:
       So it seems will be hard to write a test to verify the case in HDDS-4589 :) 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] GlenGeng closed pull request #1700: HDDS-4589: Handle potential data loss during ReplicationManager.handleOverReplicatedContainer()

Posted by GitBox <gi...@apache.org>.

GlenGeng closed pull request #1700:
URL: https://github.com/apache/ozone/pull/1700


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] GlenGeng commented on a change in pull request #1700: HDDS-4589: Handle potential data loss during ReplicationManager.handleOverReplicatedContainer()

Posted by GitBox <gi...@apache.org>.

GlenGeng commented on a change in pull request #1700:
URL: https://github.com/apache/ozone/pull/1700#discussion_r543886466



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -720,6 +721,12 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
 
       final List<ContainerReplica> eligibleReplicas = new ArrayList<>(replicas);
 
+      // Iterate replicas in deterministic order to avoid potential data loss.
+      // See https://issues.apache.org/jira/browse/HDDS-4589.
+      // N.B., sort replicas by (containerID, datanodeDetails).
+      eligibleReplicas.sort(
+          Comparator.comparingLong(ContainerReplica::hashCode));

Review comment:
       Yeah. Such data loss issue exists on master as well, yet never reported.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] GlenGeng closed pull request #1700: HDDS-4589: Handle potential data loss during ReplicationManager.handleOverReplicatedContainer()

Posted by GitBox <gi...@apache.org>.

GlenGeng closed pull request #1700:
URL: https://github.com/apache/ozone/pull/1700


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] GlenGeng commented on pull request #1700: HDDS-4589: Handle potential data loss during ReplicationManager.handleOverReplicatedContainer()

Posted by GitBox <gi...@apache.org>.

GlenGeng commented on pull request #1700:
URL: https://github.com/apache/ozone/pull/1700#issuecomment-745065156


   @sodonnel @nandakumar131 could you please take a look at this PR?
   
   This is issue is found by @nandakumar131 during his design of SCM HA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] sodonnel commented on pull request #1700: HDDS-4589: Handle potential data loss during ReplicationManager.handleOverReplicatedContainer()

Posted by GitBox <gi...@apache.org>.

sodonnel commented on pull request #1700:
URL: https://github.com/apache/ozone/pull/1700#issuecomment-745207917


   This is a clever solution to the problem, however I worry it may not work well in practice. Sorting the ContainerReplica will use this:
   
   ```
     @Override
     public int compareTo(ContainerReplica that) {
       Preconditions.checkNotNull(that);
       return new CompareToBuilder()
           .append(this.containerID, that.containerID)
           .append(this.datanodeDetails, that.datanodeDetails)
           .build();
     }
   ```
   
   The containerID is fixed for the container, so you are effectively sorting by the datanode address. This means that in general, all containers from the same pipeline will always have an over-replicated container removed from the same node potentially.
   
   Say we decommission a host, then recommission it. We will have a lot of containers with 4 replicas. We sort the DN list each time, and there is a good chance that all the replicas could be removed from the same host (the decommissioned and recommission one, or one of the original hosts), rather than removing the replicas randomly across the cluster. This may result in some nodes having much more free space than others.
   
   This suggestion is obviously a much bigger change, but I wonder if it would be possible to have the DNs provide a list of pending_delete blocks in their container report / heartbeat, and then we can use that in SCM?
   
   Or, if the DNs detect a new master SCM or a restarted SCM (I am not up-to-speed on the SCM HA area), then purge their pending delete list and wait for new instructions from the new/restarted SCM?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] GlenGeng commented on pull request #1700: HDDS-4589: Handle potential data loss during ReplicationManager.handleOverReplicatedContainer()

Posted by GitBox <gi...@apache.org>.

GlenGeng commented on pull request #1700:
URL: https://github.com/apache/ozone/pull/1700#issuecomment-745745415

Thanks @sodonnel for the insights, it really helps!

The first solution `Collections.sort(eligibleReplicas);` sorts the replicas by UUIDs of the DNs they locates, which will cause imbalance of storage usage across DNs. Say, one pipeline is engaged by three DNs, DN1, DN2, DN3, whose UUIDs are in increasing order. For the containers created this pipeline, replicas on DN1 will always be removed before that on DN3, and cause imbalance of storage usage.

Our solution should be, for the containers of the same pipeline, different containers will have different sorted list for DN1, DN2, DN3. Let’s just add salt into the sort, we just sort according to the `hash(containerID, UUID of DN)`. In this case, we will randomly remove replicas from the 3 DNs.

However, it can only handle in-flight delete, which is not a thorough solution for both in-flight add and in-flight delete. We(you, nanda and me) can schedule a talk this week or next week for the discussion of a decent solution.

Ahead of that, we would like merge this one-line-fix into HDDS-2823, as a protection for the potential data loss issue. After the decent solution is merged, we can decide whether revert this quick fix or not.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] GlenGeng commented on pull request #1700: HDDS-4589: Handle potential data loss during ReplicationManager.handleOverReplicatedContainer()

Posted by GitBox <gi...@apache.org>.

GlenGeng commented on pull request #1700:
URL: https://github.com/apache/ozone/pull/1700#issuecomment-745807682


   Filed a jira for the long-term solution.
   https://issues.apache.org/jira/browse/HDDS-4599
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org