You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/12/15 10:45:50 UTC

[GitHub] [ozone] sodonnel commented on pull request #1700: HDDS-4589: Handle potential data loss during ReplicationManager.handleOverReplicatedContainer()

sodonnel commented on pull request #1700:
URL: https://github.com/apache/ozone/pull/1700#issuecomment-745207917


   This is a clever solution to the problem, however I worry it may not work well in practice. Sorting the ContainerReplica will use this:
   
   ```
     @Override
     public int compareTo(ContainerReplica that) {
       Preconditions.checkNotNull(that);
       return new CompareToBuilder()
           .append(this.containerID, that.containerID)
           .append(this.datanodeDetails, that.datanodeDetails)
           .build();
     }
   ```
   
   The containerID is fixed for the container, so you are effectively sorting by the datanode address. This means that in general, all containers from the same pipeline will always have an over-replicated container removed from the same node potentially.
   
   Say we decommission a host, then recommission it. We will have a lot of containers with 4 replicas. We sort the DN list each time, and there is a good chance that all the replicas could be removed from the same host (the decommissioned and recommission one, or one of the original hosts), rather than removing the replicas randomly across the cluster. This may result in some nodes having much more free space than others.
   
   This suggestion is obviously a much bigger change, but I wonder if it would be possible to have the DNs provide a list of pending_delete blocks in their container report / heartbeat, and then we can use that in SCM?
   
   Or, if the DNs detect a new master SCM or a restarted SCM (I am not up-to-speed on the SCM HA area), then purge their pending delete list and wait for new instructions from the new/restarted SCM?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org