You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Mark Gui (Jira)" <ji...@apache.org> on 2021/08/26 11:20:00 UTC

[jira] [Created] (HDDS-5679) Use more defensive sizeRequired for replication manager for container replication.

Mark Gui created HDDS-5679:
------------------------------

             Summary: Use more defensive sizeRequired for replication manager for  container replication.
                 Key: HDDS-5679
                 URL: https://issues.apache.org/jira/browse/HDDS-5679
             Project: Apache Ozone
          Issue Type: Bug
            Reporter: Mark Gui


We hit a bug when replicating a container of some size about 2GB < 5GB(container size):
{code:java}
// code placeholder
2021-08-25 19:12:31,945 [ContainerReplicationThread-4] ERROR org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator: Container 73446 replication was unsuccessful.
org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The volume with the most available space (=2580881408 B) is less than the container size (=5368709120 B).
        at org.apache.hadoop.ozone.container.common.volume.RoundRobinVolumeChoosingPolicy.chooseVolume(RoundRobinVolumeChoosingPolicy.java:77)
        at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.populateContainerPathFields(KeyValueHandler.java:290)
        at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.importContainer(KeyValueHandler.java:907)
        at org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.importContainer(ContainerController.java:139)
        at org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.importContainer(DownloadAndImportReplicator.java:90)
        at org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.replicate(DownloadAndImportReplicator.java:135)
        at org.apache.hadoop.ozone.container.replication.MeasuredReplicator.replicate(MeasuredReplicator.java:69)
        at org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:139)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2021-08-25 19:12:31,946 [ContainerReplicationThread-4] ERROR org.apache.hadoop.ozone.container.replication.ReplicationSupervisor: Container 73446 can't be downloaded from any of the datanodes.
{code}
ReplicationManager will place the container replica to a datanode with enough space, but when datanode wants to create a container replica, it will check if whether there's at least 5GB(container size) left, so even that we have enough space for a container of 2GB, we will hit an out of space exception.

In this case, RM should not schedule this replica to this datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org