You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/08/19 11:11:12 UTC

[GitHub] [hadoop-ozone] ChenSammi opened a new pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

ChenSammi opened a new pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338


   https://issues.apache.org/jira/browse/HDDS-4023


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] sodonnel commented on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-700016623


   This change looks almost good now. I wonder about two final things:
   
   1. In `updateContainerStats(...)` do you think we should return if the container is DELETING or DELETED without making any updates? If this is a stale replica, they it may not be empty and hence could update the container stats incorrectly. If the Container is already in DELETING or DELETED state, then we can ignore any changes to it, as we know the stale replica will get removed anyway.
   
   2. I am wondering if we could receive a stale replica when the container is DELETING. Then a replica would get added in `updateContainerReplica(...)`. Then later the container will go to DELETED and the state replica will get reported again - at that point we will send a delete command, but the replica will never get removed from memory now I think. Would it make sense to send the delete command for any replicas received when the container is DELETING or DELETED?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-687089550


   Hi, @sodonnel  do you have time for review? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-678126933


   
   Hi @sodonnel thanks for review the code and raise the discussion.  
   
   > I wonder if it would be simpler to remove empty containers as part of Container Report processing? In `AbstractContainerReportHandler#updateContainerState`, we could check the size and number of keys of the reported containers in the CLOSED branch of the switch statement, and then take action to delete an empty container there? I have a feeling it might be simpler, but I am not sure. The disadvantage of doing it in the Container Report Processing, is that we are dealing with only a single replica at that stage. However if the container is CLOSED in SCM, and a report says it is empty then we should be good to simply remove the container from SCM and issue the delete container command when processing the container report.
   > 
   > In looking at how this all works, I noticed there is a bug in the existing method AbstractContainerReportHandler#updateContainerStats, which might stop your change working:
   > 
   > ```
   >   private void updateContainerStats(final ContainerID containerId,
   >                                     final ContainerReplicaProto replicaProto)
   >       throws ContainerNotFoundException {
   >     if (isHealthy(replicaProto::getState)) {
   >       final ContainerInfo containerInfo = containerManager
   >           .getContainer(containerId);
   > 
   >       if (containerInfo.getSequenceId() <
   >           replicaProto.getBlockCommitSequenceId()) {
   >         containerInfo.updateSequenceId(
   >             replicaProto.getBlockCommitSequenceId());
   >       }
   >       if (containerInfo.getUsedBytes() < replicaProto.getUsed()) {
   >         containerInfo.setUsedBytes(replicaProto.getUsed());
   >       }
   >       if (containerInfo.getNumberOfKeys() < replicaProto.getKeyCount()) {
   >         containerInfo.setNumberOfKeys(replicaProto.getKeyCount());
   >       }
   >     }
   >   }
   > ```
   > 
   > This logic assumes the bytes used and keyCount is always increasing on the container. However in this case, where blocks are deleted the keyCount and bytesUsed can decrease. Therefore I don't think the stats will ever get updated in SCM via the container reports when blocks are removed.
   
   This logic works for OPEN container, but not for CLOSED container. 
   
   > 
   > I then noticed you recently made a change in ReplicationManager to update these counts here:
   > 
   > https://github.com/apache/hadoop-ozone/pull/1295/files#diff-218132dfe72e6e39b9eaaf59737adcf2R780
   > 
   > I think the ReplicationManager part of that change should be reverted and it handled via the ContainerReports by fixing the bug I mentioned above. The only reason we need the Replication Manager change is to work around the bug above. If it was fixed, we don't need that logic in RM.
   You can find my comments in HDDS-4131. 
   
   > 
   > If we decide to keep your changes inside replicationManager (for this change), then we would need a few more tests added to TestReplicationManager to cover the new scenarios.
   
   TestReplicationManager uses Mockito which is not a real end-to-end test. I add a end-to-end test in TestBlockDeletion. 
   
   > 
   > I also have a question - when the replicas are all deleted, and we call:
   > 
   > ```
   > containerManager.updateContainerState(container.containerID(),
   >           HddsProtos.LifeCycleEvent.CLEANUP);
   > ```
   > 
   > Does this somehow cause the container to be removed from SCM memory and the persistent store?
   Container of DELETED state  currently is still in SCM memory and persistent store.  Of course we can just delete the Container from memory and DB when it's state changes to DELETED. But I feel like current way is a more safe way,  like a Container trash bin. We can add a new purge DELETED Container action next so user can manually delete these Container when they sure these information are no longer needed.   Overall, I'm open to this question, fine with either way.  
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] sodonnel edited a comment on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
sodonnel edited a comment on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-676472849


   Thanks for working on this @ChenSammi.
   
   I wonder if it would be simpler to remove empty containers as part of Container Report processing? In `AbstractContainerReportHandler#updateContainerState`, we could check the size and number of keys of the reported containers in the CLOSED branch of the switch statement, and then take action to delete an empty container there? I have a feeling it might be simpler, but I am not sure. The disadvantage of doing it in the Container Report Processing, is that we are dealing with only a single replica at that stage. However if the container is CLOSED in SCM, and a report says it is empty then we should be good to simply remove the container from SCM and issue the delete container command when processing the container report.
   
   In looking at how this all works, I noticed there is a bug in the existing method AbstractContainerReportHandler#updateContainerStats, which might stop your change working:
   
   ```
     private void updateContainerStats(final ContainerID containerId,
                                       final ContainerReplicaProto replicaProto)
         throws ContainerNotFoundException {
       if (isHealthy(replicaProto::getState)) {
         final ContainerInfo containerInfo = containerManager
             .getContainer(containerId);
   
         if (containerInfo.getSequenceId() <
             replicaProto.getBlockCommitSequenceId()) {
           containerInfo.updateSequenceId(
               replicaProto.getBlockCommitSequenceId());
         }
         if (containerInfo.getUsedBytes() < replicaProto.getUsed()) {
           containerInfo.setUsedBytes(replicaProto.getUsed());
         }
         if (containerInfo.getNumberOfKeys() < replicaProto.getKeyCount()) {
           containerInfo.setNumberOfKeys(replicaProto.getKeyCount());
         }
       }
     }
   ```
   
   This logic assumes the bytes used and keyCount is always increasing on the container. However in this case, where blocks are deleted the keyCount and bytesUsed can decrease. Therefore I don't think the stats will ever get updated in SCM via the container reports when blocks are removed.
   
   I then noticed you recently made a change in ReplicationManager to update these counts here:
   
   https://github.com/apache/hadoop-ozone/pull/1295/files#diff-218132dfe72e6e39b9eaaf59737adcf2R780
   
   I think the ReplicationManager part of that change should be reverted and it handled via the ContainerReports by fixing the bug I mentioned above. The only reason we need the Replication Manager change is to work around the bug above. If it was fixed, we don't need that logic in RM.
   
   If we decide to keep your changes inside replicationManager (for this change), then we would need a few more tests added to TestReplicationManager to cover the new scenarios.
   
   I also have a question - when the replicas are all deleted, and we call:
   
   ```
   containerManager.updateContainerState(container.containerID(),
             HddsProtos.LifeCycleEvent.CLEANUP);
   ```
   
   Does this somehow cause the container to be removed from SCM memory and the persistent store?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-695883294


   @sodonnel  @elek  could you help to take another look? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] sodonnel commented on a change in pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#discussion_r484808717



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -312,6 +313,16 @@ private void processContainer(ContainerID id) {
           action -> replicas.stream()
               .noneMatch(r -> r.getDatanodeDetails().equals(action.datanode)));
 
+      /*
+       * If container is under deleting and all it's replicas are deleted, then
+       * make the container as CLEANED, or resend the delete replica command if
+       * needed.
+       */
+      if (state == LifeCycleState.DELETING) {
+        handleContainerUnderDelete(container, replicas);
+        return;
+      }
+

Review comment:
       Should we add another check here:
   ```
   if (state == LifeCycleState.DELETED) {
     return
   }
   ```
   
   This will avoid doing any further processing on a container which is expected to have zero replicas?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi edited a comment on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
ChenSammi edited a comment on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-693378960


   Thanks @sodonnel for very thoughtful suggestions. 
   
   >     2. Consider if a datanode was dead for some time, and is then restarted. It will start reporting a replica for the DELETED container and with the current logic I don't think we will clean it up as `isContainerEmpty(...)` will always return false due to it checking for only CLOSED containers. I feel that this should be handled in the ContainerReportHandler code. If a replica is reported for a container which is DELETED in SCM, then we should instruct the datanode to delete the replica. What do you think? The parameter hdds.scm.unknown-container.action currently controls what to do if a DN reports a containers which SCM does not know about. The default is to log a warning, but for a DELETED container, I think we should just delete the replica.
   
   Agree. we should just delete the replica in the case. 
   
   > 
   >     3. There is another corner case, where the dead node may have a container which is out of sync with the others, as it was dead when the blocks should have been removed. Therefore we had 3 empty container replicas, plus one non-empty dead one. The 3 empty replicas are removed, and the container moved to "DELETED". Then the dead node starts reporting a non-empty replica for the DELETED container. This is actually a wider problem, in that I believe there are scenarios where container replicas can get out of sync with one another and there is nothing to reconcile them and fix it.
   
   The extra stale replica is a real problem, not just in empty container deletion case, but also for non-empty CLOSED container case.  After a delete block command is confirmed by 3 datanodes, this delete block command is finished and removed from block deletion log. So if after that, a fourth replica shows up,  SCM will not send an new delete block command to this replica.  
   But this fourth replica will have all remaining delete block commands executed if it's datanode is still on.   Maybe we can rely on over replicated container replica deletion logic to remove this extra fourth replica.  But it's hard to choose which one to delete if there deleteTransactionId and blockCommitSequenceId are different. Currently handleOverReplicatedContainer doesn't check these two fields of each replica.  
   
   > 
   >     4. In ContainerSafeModeRule, I think we need to consider DELETING and DELETED. If there are a lot of DELETED containers in the cluster, then they will never get replicas and the safemode rule will never get met. I think this could be as simple as excluding DELETING and DELETED when loading the containerMap (right now OPEN and CLOSING are excluded) so we don't wait for replicas on these containers.
   
   Good catch!
   
   > 
   >     5. In `ReplicationManager.isContainerUnderReplicated(...)`, should we immediately return true if the containerState is DELETING or DELETED?
   
   Right.
   
   I also agree we should purge these deleted containers after a while.   We can have a timer task to periodically remove the DELETED containers from SCM DB.  What do you think? 
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] sodonnel commented on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-688978976


   Thinking some more about what to do with containers in the DELETED state in SCM. If we are not confident to delete them immediately and automatically, maybe we should provide an command "purge empty containers" which can be run by an admin, or via Recon. We should show in Recon how many empty containers are present in the cluster and then provide an option to remove them.
   
   Alternatively we could have some threshold in SCM, so that when the DELETED containers crosses some threshold, they are all removed.
   
   We could do either of these in a separate jira if this idea makes sense.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] sodonnel commented on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-698259315


   Sorry for the slow reply on this. I have been caught up on some other things.
   
   > After a second thought, deleting the container record in SCM DB immediately while keep it in memory maybe a better and clean choice. So if there is stale container replica, it can be deleted based on in memory information.
   
   I think this is a good enough idea for now. If SCM is up for a very long time, perhaps in the future we will want to add a thread to clear all the in memory DELETED containers. One small concern is that if a container goes DELETED and then SCM is restarted soon after. Then a DN is restarted and reports a stale replica, it will just be seen as an unknown container. The default position there, is to log a warning. The config hdds.scm.unknown-container.action controls this. This is all an edge case - most of the time, all DNs should be up anyway.
   
   I left just one comment on a suggested refactor in the container report handler, when dealing with replicas from a DELETED container.
   
   Could you also add a test in TestContainerReportHander to check the logic around deleting a replica from a DELETED container?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] linyiqun commented on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
linyiqun commented on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-700598266


   > @linyiqun I do agree that I think this could be handled more cleanly and efficiently in the container report handler. However its probably not much of an overhead for replication manager. I am happy for us to commit the change as it is, and we can see how it performs in practice. Worst case we have to refactor the change out of RM into the report handler. What do you think?
   
   +1 for this, @sodonnel .
   
   @ChenSammi , can you add a TODO comment while committing like below? That will be helpful for us to revisit this in the future.
   // TODO: container report handling the empty containers.
   if (isContainerEmpty(container, replicas)) {
          deleteContainerReplicas(container, replicas);
   }
   
   +1 from me.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-700642125


   Thanks @sodonnel  and @linyiqun for the review. 
   
   Basically I think report handler is not a good place to handle all the empty container deletion process.  It can tell which one is empty , but it lacks of the facilities in ReplicationManager, such as inflightDeletion, such as handle send command to extra replica for DELETING state container, or resend command.  In future,  when container compation is considered, we can move this container deletion logic to be together. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] sodonnel commented on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-676472849


   Thanks for working on this @ChenSammi.
   
   I wonder if it would be simpler to remove empty containers as part of Container Report processing? In `AbstractContainerReportHandler#updateContainerState`, we could check the size and number of keys of the reported containers in the CLOSED branch of the switch statement, and then take action to delete an empty container there? I have a feeling it might be simpler, but I am not sure. The disadvantage of doing it in the Container Report Processing, is that we are dealing with only a single replica at that stage. However if the container is CLOSED in SCM, and a report says it is empty then we should be good to simply remove the container from SCM and issue the delete container command when processing the container report.
   
   In looking at how this all works, I noticed there is a bug in the existing method AbstractContainerReportHandler#updateContainerStats, which might stop your change working:
   
   ```
     private void updateContainerStats(final ContainerID containerId,
                                       final ContainerReplicaProto replicaProto)
         throws ContainerNotFoundException {
       if (isHealthy(replicaProto::getState)) {
         final ContainerInfo containerInfo = containerManager
             .getContainer(containerId);
   
         if (containerInfo.getSequenceId() <
             replicaProto.getBlockCommitSequenceId()) {
           containerInfo.updateSequenceId(
               replicaProto.getBlockCommitSequenceId());
         }
         if (containerInfo.getUsedBytes() < replicaProto.getUsed()) {
           containerInfo.setUsedBytes(replicaProto.getUsed());
         }
         if (containerInfo.getNumberOfKeys() < replicaProto.getKeyCount()) {
           containerInfo.setNumberOfKeys(replicaProto.getKeyCount());
         }
       }
     }
   ```
   
   This logic assumes the bytes used and keyCount is always increasing on the container. However in this case, where blocks are deleted the keyCount and bytesUsed can decrease. Therefore I don't think the stats will ever get updated in SCM via the container reports when blocks are removed.
   
   I then noticed you recently made a change in ReplicationManager to update these counts here:
   
   https://github.com/apache/hadoop-ozone/pull/1295/files#diff-218132dfe72e6e39b9eaaf59737adcf2R780
   
   I think the ReplicationManager part of that change should be reverted and it handled via the ContainerReports by fixing the bug I mentioned above. The only reason we need the Replication Manager change is to work around the bug above. If it was fixed, we don't need that logic in RM.
   
   If we decide to keep your changes inside replicationManager, then we would need a few more tests added to TestReplicationManager to cover the new scenarios.
   
   I also have a question - when the replicas are all deleted, and we call:
   
   ```
   containerManager.updateContainerState(container.containerID(),
             HddsProtos.LifeCycleEvent.CLEANUP);
   ```
   
   Does this somehow cause the container to be removed from SCM memory and the persistent store?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] sodonnel commented on a change in pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#discussion_r494180772



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/AbstractContainerReportHandler.java
##########
@@ -96,13 +101,24 @@ protected void processContainerReplica(final DatanodeDetails datanodeDetails,
    */
   private void updateContainerStats(final DatanodeDetails datanodeDetails,
                                     final ContainerID containerId,
-                                    final ContainerReplicaProto replicaProto)
+                                    final ContainerReplicaProto replicaProto,
+                                    final EventPublisher publisher)
       throws ContainerNotFoundException {
+    final ContainerInfo containerInfo = containerManager
+        .getContainer(containerId);
 
-    if (isHealthy(replicaProto::getState)) {
-      final ContainerInfo containerInfo = containerManager
-          .getContainer(containerId);
+    if (containerInfo.getState() == HddsProtos.LifeCycleState.DELETED) {

Review comment:
       It doesn't seem correct to put the logic to delete the replica for the DELETED container inside `updateContainerStats`.
   
   In `ContainerReportHandler#processContainerReplicas(..)` there is logic to delete an unknown container in the exception handler.
   
   Could we extract this into a new method, which is called from the exception handler. Then in `AbstractContainerReportHandler#updateContainerState(...)` handle the containers which should be deleted in the "case DELETED" branch of the swith statement. It could call that same extracted method - that way the logic to form the DeleteContainer command will be the same for both? It also seems more logical to put the delete inside UpdateContainerState rather than UpdateContainerStats.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-686864636


   The failed UT is irrelevant, fixed by https://github.com/apache/hadoop-ozone/pull/1379.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-698899914


   > Sorry for the slow reply on this. I have been caught up on some other things.
   > 
   > > After a second thought, deleting the container record in SCM DB immediately while keep it in memory maybe a better and clean choice. So if there is stale container replica, it can be deleted based on in memory information.
   > 
   > I think this is a good enough idea for now. If SCM is up for a very long time, perhaps in the future we will want to add a thread to clear all the in memory DELETED containers. One small concern is that if a container goes DELETED and then SCM is restarted soon after. Then a DN is restarted and reports a stale replica, it will just be seen as an unknown container. The default position there, is to log a warning. The config hdds.scm.unknown-container.action controls this. This is all an edge case - most of the time, all DNs should be up anyway.
   > 
   > I left just one comment on a suggested refactor in the container report handler, when dealing with replicas from a DELETED container.
   > 
   > Could you also add a test in TestContainerReportHander to check the logic around deleting a replica from a DELETED container?
   
   Thanks @sodonnel ,  a new commit to address the concerns. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-695883294


   @sodonnel  @elek  could you help to take another look? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-700409940


   > 
   > 
   > This change looks almost good now. I wonder about two final things:
   > 
   >     1. In `updateContainerStats(...)` do you think we should return if the container is DELETING or DELETED without making any updates? If this is a stale replica, they it may not be empty and hence could update the container stats incorrectly. If the Container is already in DELETING or DELETED state, then we can ignore any changes to it, as we know the stale replica will get removed anyway.
   > 
   >     2. I am wondering if we could receive a stale replica when the container is DELETING. Then a replica would get added in `updateContainerReplica(...)`. Then later the container will go to DELETED and the state replica will get reported again - at that point we will send a delete command, but the replica will never get removed from memory now I think. Would it make sense to send the delete command for any replicas received when the container is DELETING or DELETED?
   
   
   
   > 
   > 
   > This change looks almost good now. I wonder about two final things:
   > 
   >     1. In `updateContainerStats(...)` do you think we should return if the container is DELETING or DELETED without making any updates? If this is a stale replica, they it may not be empty and hence could update the container stats incorrectly. If the Container is already in DELETING or DELETED state, then we can ignore any changes to it, as we know the stale replica will get removed anyway.
   > 
   >     2. I am wondering if we could receive a stale replica when the container is DELETING. Then a replica would get added in `updateContainerReplica(...)`. Then later the container will go to DELETED and the state replica will get reported again - at that point we will send a delete command, but the replica will never get removed from memory now I think. Would it make sense to send the delete command for any replicas received when the container is DELETING or DELETED?
   
   There is following logic in ReplicatioManager, which will handle the replicas reported during container state is DELETING. 
   So we only need to send delete replica command for DELETED container during container report proceess, and let the ReplicationManager handle the DELETING container replica, because when a container in in DELETING state, It's sure that scm send out some replica deletion commands,  but we cannot tell replica stale or not in container report. 
   
   
    if (state == LifeCycleState.DELETING) {
           handleContainerUnderDelete(container, replicas);
           return;
         }
   
   @sodonnel 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] linyiqun edited a comment on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
linyiqun edited a comment on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-700598266


   > @linyiqun I do agree that I think this could be handled more cleanly and efficiently in the container report handler. However its probably not much of an overhead for replication manager. I am happy for us to commit the change as it is, and we can see how it performs in practice. Worst case we have to refactor the change out of RM into the report handler. What do you think?
   
   +1 for this, @sodonnel .
   
   @ChenSammi , can you add a TODO comment like below while committing this PR? That will be helpful for us to revisit this in the future.
   // TODO: container report handling the empty containers.
   if (isContainerEmpty(container, replicas)) {
          deleteContainerReplicas(container, replicas);
   }
   
   +1 from me.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] sodonnel commented on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-700582338


   > There is following logic in ReplicatioManager, which will handle the replicas reported during container state is DELETING.
   
   Sorry I missed that. You are correct. I am +1 on this change as it is now, so feel free to commit it.
   
   @linyiqun I do agree that I think this could be handled more cleanly and efficiently in the container report handler. However its probably not much of an overhead for replication manager. I am happy for us to commit the change as it is, and we can see how it performs in practice. Worst case we have to refactor the change out of RM into the report handler. What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] sodonnel commented on a change in pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#discussion_r494180772



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/AbstractContainerReportHandler.java
##########
@@ -96,13 +101,24 @@ protected void processContainerReplica(final DatanodeDetails datanodeDetails,
    */
   private void updateContainerStats(final DatanodeDetails datanodeDetails,
                                     final ContainerID containerId,
-                                    final ContainerReplicaProto replicaProto)
+                                    final ContainerReplicaProto replicaProto,
+                                    final EventPublisher publisher)
       throws ContainerNotFoundException {
+    final ContainerInfo containerInfo = containerManager
+        .getContainer(containerId);
 
-    if (isHealthy(replicaProto::getState)) {
-      final ContainerInfo containerInfo = containerManager
-          .getContainer(containerId);
+    if (containerInfo.getState() == HddsProtos.LifeCycleState.DELETED) {

Review comment:
       It doesn't seem correct to put the logic to delete the replica for the DELETED container inside `updateContainerStats`.
   
   In `ContainerReportHandler#processContainerReplicas(..)` there is logic to delete an unknown container in the exception handler.
   
   Could we extract this into a new method, which is called from the exception handler. Then in `AbstractContainerReportHandler#updateContainerState(...)` handle the containers which should be deleted in the "case DELETED" branch of the swith statement. It could call that same extracted method - that way the logic to form the DeleteContainer command will be the same for both? It also seems more logical to put the delete inside UpdateContainerState rather than UpdateContainerStats.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi edited a comment on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
ChenSammi edited a comment on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-687089550


   Hi, @sodonnel  do you have time for another review? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-693378960


   Thanks @sodonnel for very thoughtful suggestions. 
   
   >     2. Consider if a datanode was dead for some time, and is then restarted. It will start reporting a replica for the DELETED container and with the current logic I don't think we will clean it up as `isContainerEmpty(...)` will always return false due to it checking for only CLOSED containers. I feel that this should be handled in the ContainerReportHandler code. If a replica is reported for a container which is DELETED in SCM, then we should instruct the datanode to delete the replica. What do you think? The parameter hdds.scm.unknown-container.action currently controls what to do if a DN reports a containers which SCM does not know about. The default is to log a warning, but for a DELETED container, I think we should just delete the replica.
   
   Agree. we should just delete the replica in the case. 
   
   > 
   >     3. There is another corner case, where the dead node may have a container which is out of sync with the others, as it was dead when the blocks should have been removed. Therefore we had 3 empty container replicas, plus one non-empty dead one. The 3 empty replicas are removed, and the container moved to "DELETED". Then the dead node starts reporting a non-empty replica for the DELETED container. This is actually a wider problem, in that I believe there are scenarios where container replicas can get out of sync with one another and there is nothing to reconcile them and fix it.
   
   The extra stale replica is a real problem, not just in empty container deletion case, but also for non-empty CLOSED container case.  After a delete block command is confirmed by 3 datanodes, this delete block command is finished and removed from block deletion log. So if after that, a fourth replica shows up,  SCM will not send an new delete block command to this replica.  
   But this fourth replica will have all remaining delete block commands executed if it's datanode is still on.   Maybe we can rely on over replicated container replica deletion logic to remove this extra fourth replica.  But it's hard to choose which one to delete if there deleteTransactionId and blockCommitSequenceId are different. Currently handleOverReplicatedContainer doesn't check these two fields of each replica.  
   
   > 
   >     4. In ContainerSafeModeRule, I think we need to consider DELETING and DELETED. If there are a lot of DELETED containers in the cluster, then they will never get replicas and the safemode rule will never get met. I think this could be as simple as excluding DELETING and DELETED when loading the containerMap (right now OPEN and CLOSING are excluded) so we don't wait for replicas on these containers.
   
   Got catch!
   
   > 
   >     5. In `ReplicationManager.isContainerUnderReplicated(...)`, should we immediately return true if the containerState is DELETING or DELETED?
   
   Right.
   
   I also agree we should purge these deleted containers after a while.   We can have a timer task to periodically remove the DELETED containers from SCM DB.  What do you think? 
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] sodonnel commented on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-676523668


   I raised HDDS-4131 / https://github.com/apache/hadoop-ozone/pull/1339 discuss and address the bytesUsed and KeyCount metric problem I mentioned above.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-694037951


   After a second thought,  deleting the container record in SCM DB immediately while keep it in memory maybe a better and clean choice.  So if there is stale container replica, it can be deleted based on in memory information.  And next time when SCM start, SCM doesn't need to handle DELETED containers anymore. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] sodonnel commented on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-688774840


   As I said before, I have a feeling that it *might* be better to have a separate service for deleting empty containers, and it would perhaps naturally be part of a "merge small containers" service, which we will need to create eventually. However, the changes to the replication manager here are not too big, and you could argue the current solution fits here OK. Therefore I am OK with keeping this logic in the replication manager. However when it comes to merging small containers in the future, I believe we should create a separate service and not clutter RM with that new feature too.
   
   The patch looks mostly good to me, there are just a couple of areas we need to consider.
   
   1. What do we do with the containers in DELETED state, and with no replicas? I don't think we should keep them in SCM forever, but we are also worried about removing them. However that said, if the replicas have all been deleted, what use is there in keeping the container in SCM?
   
   2. Consider if a datanode was dead for some time, and is then restarted. It will start reporting a replica for the DELETED container and with the current logic I don't think we will clean it up as `isContainerEmpty(...)` will always return false due to it checking for only CLOSED containers. I feel that this should be handled in the ContainerReportHandler code. If a replica is reported for a container which is DELETED in SCM, then we should instruct the datanode to delete the replica. What do you think? The parameter hdds.scm.unknown-container.action currently controls what to do if a DN reports a containers which SCM does not know about. The default is to log a warning, but for a DELETED container, I think we should just delete the replica.
   
   3. There is another corner case, where the dead node may have a container which is out of sync with the others, as it was dead when the blocks should have been removed. Therefore we had 3 empty container replicas, plus one non-empty dead one. The 3 empty replicas are removed, and the container moved to "DELETED". Then the dead node starts reporting a non-empty replica for the DELETED container. This is actually a wider problem, in that I believe there are scenarios where container replicas can get out of sync with one another and there is nothing to reconcile them and fix it.
   
   4. In ContainerSafeModeRule, I think we need to consider DELETING and DELETED. If there are a lot of DELETED containers in the cluster, then they will never get replicas and the safemode rule will never get met. I think this could be as simple as excluding DELETING and DELETED when loading the containerMap (right now OPEN and CLOSING are excluded) so we don't wait for replicas on these containers.
   
   5. In `ReplicationManager.isContainerUnderReplicated(...)`, should we immediately return true if the containerState is DELETING or DELETED?
   
   6. Are there any other places outside of ReplicationManager and ContainerSafeModeRule where we monitor under / over replicated containers I may have missed?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-698899914


   > Sorry for the slow reply on this. I have been caught up on some other things.
   > 
   > > After a second thought, deleting the container record in SCM DB immediately while keep it in memory maybe a better and clean choice. So if there is stale container replica, it can be deleted based on in memory information.
   > 
   > I think this is a good enough idea for now. If SCM is up for a very long time, perhaps in the future we will want to add a thread to clear all the in memory DELETED containers. One small concern is that if a container goes DELETED and then SCM is restarted soon after. Then a DN is restarted and reports a stale replica, it will just be seen as an unknown container. The default position there, is to log a warning. The config hdds.scm.unknown-container.action controls this. This is all an edge case - most of the time, all DNs should be up anyway.
   > 
   > I left just one comment on a suggested refactor in the container report handler, when dealing with replicas from a DELETED container.
   > 
   > Could you also add a test in TestContainerReportHander to check the logic around deleting a replica from a DELETED container?
   
   Thanks @sodonnel ,  a new commit to address the concerns. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] linyiqun commented on a change in pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
linyiqun commented on a change in pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#discussion_r496544696



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -320,6 +331,12 @@ private void processContainer(ContainerID id) {
        * exact number of replicas in the same state.
        */
       if (isContainerHealthy(container, replicas)) {
+        /*
+         *  If container is empty, schedule task to delete the container.
+         */
+        if (isContainerEmpty(container, replicas)) {
+          deleteContainerReplicas(container, replicas);
+        }

Review comment:
       @ChenSammi , Is there any specific reason that we let ReplicationManager to help clean empty containers?  After this, ReplicaManager will do additionally container empty check for all healthy containers. Not sure if this is an efficiency way to put logic here.
   >I wonder if it would be simpler to remove empty containers as part of Container Report processing? In AbstractContainerReportHandler#updateContainerState, we could check the size and number of keys of the reported containers in the CLOSED branch of the switch statement, and then take action to delete an empty container there? I have a feeling it might be simpler, but I am not sure. The disadvantage of doing it in the Container Report Processing, is that we are dealing with only a single replica at that stage. However if the container is CLOSED in SCM, and a report says it is empty then we should be good to simply remove the container from SCM and issue the delete container command when processing the container report.
   
   Actually I prefer this way as @sodonnel mentioned.
   >but I am not sure. The disadvantage of doing it in the Container Report Processing, is that we are dealing with only a single replica at that stage
   
   
   We could also get all replica info and check state in ContainerReportHandler, then send delete container command
   
   I'm okay for current way but just share my thought for this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi merged pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
ChenSammi merged pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] sodonnel commented on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-698259315


   Sorry for the slow reply on this. I have been caught up on some other things.
   
   > After a second thought, deleting the container record in SCM DB immediately while keep it in memory maybe a better and clean choice. So if there is stale container replica, it can be deleted based on in memory information.
   
   I think this is a good enough idea for now. If SCM is up for a very long time, perhaps in the future we will want to add a thread to clear all the in memory DELETED containers. One small concern is that if a container goes DELETED and then SCM is restarted soon after. Then a DN is restarted and reports a stale replica, it will just be seen as an unknown container. The default position there, is to log a warning. The config hdds.scm.unknown-container.action controls this. This is all an edge case - most of the time, all DNs should be up anyway.
   
   I left just one comment on a suggested refactor in the container report handler, when dealing with replicas from a DELETED container.
   
   Could you also add a test in TestContainerReportHander to check the logic around deleting a replica from a DELETED container?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi edited a comment on pull request #1338: HDDS-4023. Delete closed container after all blocks have been deleted.

Posted by GitBox <gi...@apache.org>.
ChenSammi edited a comment on pull request #1338:
URL: https://github.com/apache/hadoop-ozone/pull/1338#issuecomment-678126933


   Hi @sodonnel thanks for review the code and initiate the discussion.  
   
   > I wonder if it would be simpler to remove empty containers as part of Container Report processing? In `AbstractContainerReportHandler#updateContainerState`, we could check the size and number of keys of the reported containers in the CLOSED branch of the switch statement, and then take action to delete an empty container there? I have a feeling it might be simpler, but I am not sure. The disadvantage of doing it in the Container Report Processing, is that we are dealing with only a single replica at that stage. However if the container is CLOSED in SCM, and a report says it is empty then we should be good to simply remove the container from SCM and issue the delete container command when processing the container report.
   > 
   > In looking at how this all works, I noticed there is a bug in the existing method AbstractContainerReportHandler#updateContainerStats, which might stop your change working:
   > 
   > ```
   >   private void updateContainerStats(final ContainerID containerId,
   >                                     final ContainerReplicaProto replicaProto)
   >       throws ContainerNotFoundException {
   >     if (isHealthy(replicaProto::getState)) {
   >       final ContainerInfo containerInfo = containerManager
   >           .getContainer(containerId);
   > 
   >       if (containerInfo.getSequenceId() <
   >           replicaProto.getBlockCommitSequenceId()) {
   >         containerInfo.updateSequenceId(
   >             replicaProto.getBlockCommitSequenceId());
   >       }
   >       if (containerInfo.getUsedBytes() < replicaProto.getUsed()) {
   >         containerInfo.setUsedBytes(replicaProto.getUsed());
   >       }
   >       if (containerInfo.getNumberOfKeys() < replicaProto.getKeyCount()) {
   >         containerInfo.setNumberOfKeys(replicaProto.getKeyCount());
   >       }
   >     }
   >   }
   > ```
   > 
   > This logic assumes the bytes used and keyCount is always increasing on the container. However in this case, where blocks are deleted the keyCount and bytesUsed can decrease. Therefore I don't think the stats will ever get updated in SCM via the container reports when blocks are removed.
   
   This logic works for OPEN container, but not for CLOSED container. 
   
   > 
   > I then noticed you recently made a change in ReplicationManager to update these counts here:
   > 
   > https://github.com/apache/hadoop-ozone/pull/1295/files#diff-218132dfe72e6e39b9eaaf59737adcf2R780
   > 
   > I think the ReplicationManager part of that change should be reverted and it handled via the ContainerReports by fixing the bug I mentioned above. The only reason we need the Replication Manager change is to work around the bug above. If it was fixed, we don't need that logic in RM.
   You can find my comments in HDDS-4131. 
   
   > 
   > If we decide to keep your changes inside replicationManager (for this change), then we would need a few more tests added to TestReplicationManager to cover the new scenarios.
   
   TestReplicationManager uses Mockito which is not a real end-to-end test. I add a end-to-end test in TestBlockDeletion. 
   
   > 
   > I also have a question - when the replicas are all deleted, and we call:
   > 
   > ```
   > containerManager.updateContainerState(container.containerID(),
   >           HddsProtos.LifeCycleEvent.CLEANUP);
   > ```
   > 
   > Does this somehow cause the container to be removed from SCM memory and the persistent store?
   
   Container of DELETED state  currently is still in SCM memory and persistent store.  Of course we can just delete the Container from memory and DB when it's state changes to DELETED. But I feel like current way is a more safe way,  like a Container trash bin. We can add a new purge DELETED Container action next so user can manually delete these Container when they sure these information are no longer needed.   Overall, I'm open to this question, fine with either way.  
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org