You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Glen Geng (Jira)" <ji...@apache.org> on 2020/12/15 04:11:00 UTC
[jira] [Updated] (HDDS-4589) Tackle potential data loss during ReplicationManager.handleOverReplicatedContainer()

     [ https://issues.apache.org/jira/browse/HDDS-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Glen Geng updated HDDS-4589:
----------------------------
    Description: 
 

ReplicationManager maintains the in-flight replication and deletion in-memory, which is not replicated using Ratis. So, theoretically it’s possible that we might run into issues if we immediately start ReplicationManager after a failover.

Scenario: There are 6 replicas of the container C1 namely CR1, CR2, CR3, CR4, CR5 and CR6. The container is over replicated, so the current SCM S1 decides to delete the excess replicas. SCM S1 picks CR1, CR2 and CR3 for deletion, this information is updated in the in-flight deletion list and deletion commands are sent to the datanodes. If there is a failover at this point and SCM S2 becomes leader, it doesn’t have the in-flight deletion list from SCM S1 and it finds the container C1 to be over replicated. Theoretically it’s possible that SCM S2 picks CR4, CR5 and CR6 for deletion. If this happens, we will end up in data loss.

To address this issue we will make the logic to select a replica for deletion deterministic. This will make sure that the new leader after failover will pick the same replica for deletion which was picked by the old leader.

 
h4. Approach: 

Sort the candidate replicas. Delete excess replicas from small to large. There will not be any DeleteContainerCommand for the largest 3 Replica sent by any SCM.
h4. Example:

Assume there are 6 replicas of the container C1, the factor of C1 is 3, the names of the replicas are CR1, CR2, CR3, CR4, CR5 and CR6. There will not be any DeleteContainerCommand for CR4, CR5, CR6 sent by any SCM:

If SCM sees less than or equal to 3 replicas, it won’t send any DeleteContainerCommand.

If SCM sees 4 replicas, It deletes the smallest replica, which means it won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a contradiction.

If SCM sees 5 replicas, It deletes the smallest 2 replicas, which means it won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a contradiction.

If SCM sees 6 replicas, It deletes the smallest 3 replicas, which means it won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a contradiction.

 

  was:
 

ReplicationManager maintains the in-flight replication and deletion in-memory, which is not replicated using Ratis. So, theoretically it’s possible that we might run into issues if we immediately start ReplicationManager after a failover.

Scenario: There are 6 replicas of the container C1 namely CR1, CR2, CR3, CR4, CR5 and CR6. The container is over replicated, so the current SCM S1 decides to delete the excess replicas. SCM S1 picks CR1, CR2 and CR3 for deletion, this information is updated in the in-flight deletion list and deletion commands are sent to the datanodes. If there is a failover at this point and SCM S2 becomes leader, it doesn’t have the in-flight deletion list from SCM S1 and it finds the container C1 to be over replicated. Theoretically it’s possible that SCM S2 picks CR4, CR5 and CR6 for deletion. If this happens, we will end up in data loss.

To address this issue we will make the logic to select a replica for deletion deterministic. This will make sure that the new leader after failover will pick the same replica for deletion which was picked by the old leader.

 
h4. Approach: 

Sort the candidate replicas by some field, e.g., timestamp

Delete excess replicas from small to large.

There will not be any DeleteContainerCommand for the largest 3 Replica sent by any SCM.
h4. Example:

Assume there are 6 replicas of the container C1, the factor of C1 is 3, the names of the replicas are CR1, CR2, CR3, CR4, CR5 and CR6.

 

There will not be any DeleteContainerCommand for CR4, CR5, CR6 sent by any SCM:

 

If SCM sees less than or equal to 3 replicas, it won’t send any DeleteContainerCommand.

 

If SCM sees 4 replicas, It deletes the smallest replica, which means it won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a contradiction.

 

If SCM sees 5 replicas, It deletes the smallest 2 replicas, which means it won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a contradiction.

 

If SCM sees 6 replicas, It deletes the smallest 3 replicas, which means it won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a contradiction.





 


> Tackle potential data loss during ReplicationManager.handleOverReplicatedContainer()
> ------------------------------------------------------------------------------------
>
>                 Key: HDDS-4589
>                 URL: https://issues.apache.org/jira/browse/HDDS-4589
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: SCM HA
>    Affects Versions: 1.1.0
>            Reporter: Glen Geng
>            Assignee: Glen Geng
>            Priority: Major
>             Fix For: 1.1.0
>
>
>  
> ReplicationManager maintains the in-flight replication and deletion in-memory, which is not replicated using Ratis. So, theoretically it’s possible that we might run into issues if we immediately start ReplicationManager after a failover.
> Scenario: There are 6 replicas of the container C1 namely CR1, CR2, CR3, CR4, CR5 and CR6. The container is over replicated, so the current SCM S1 decides to delete the excess replicas. SCM S1 picks CR1, CR2 and CR3 for deletion, this information is updated in the in-flight deletion list and deletion commands are sent to the datanodes. If there is a failover at this point and SCM S2 becomes leader, it doesn’t have the in-flight deletion list from SCM S1 and it finds the container C1 to be over replicated. Theoretically it’s possible that SCM S2 picks CR4, CR5 and CR6 for deletion. If this happens, we will end up in data loss.
> To address this issue we will make the logic to select a replica for deletion deterministic. This will make sure that the new leader after failover will pick the same replica for deletion which was picked by the old leader.
>  
> h4. Approach: 
> Sort the candidate replicas. Delete excess replicas from small to large. There will not be any DeleteContainerCommand for the largest 3 Replica sent by any SCM.
> h4. Example:
> Assume there are 6 replicas of the container C1, the factor of C1 is 3, the names of the replicas are CR1, CR2, CR3, CR4, CR5 and CR6. There will not be any DeleteContainerCommand for CR4, CR5, CR6 sent by any SCM:
> If SCM sees less than or equal to 3 replicas, it won’t send any DeleteContainerCommand.
> If SCM sees 4 replicas, It deletes the smallest replica, which means it won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a contradiction.
> If SCM sees 5 replicas, It deletes the smallest 2 replicas, which means it won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a contradiction.
> If SCM sees 6 replicas, It deletes the smallest 3 replicas, which means it won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a contradiction.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org