You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Stephen O'Donnell (Jira)" <ji...@apache.org> on 2023/08/16 21:05:00 UTC

[jira] [Resolved] (HDDS-7533) Intermittent failure in Decommissioning Ozone Datanode

     [ https://issues.apache.org/jira/browse/HDDS-7533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephen O'Donnell resolved HDDS-7533.
-------------------------------------
    Resolution: Invalid

Closing this for now, as it does not appear to be a valid test. If the issue is reproducible and occurs again, please reopen with more details.

> Intermittent failure in Decommissioning Ozone Datanode
> ------------------------------------------------------
>
>                 Key: HDDS-7533
>                 URL: https://issues.apache.org/jira/browse/HDDS-7533
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Datanode
>            Reporter: Varsha Ravi
>            Priority: Major
>
> Ozone decommission of datanode is stuck and does not complete even after hours.
> STEPS TO REPRODUCE:
> ---------------------------
>  # start only 3 DNs.
>  # create non-EC directory and write significant data in it
>  # shutdown these 3 DNs.
>  # Start other set DNs for writing EC data.
>  # Create EC directory and write significant data in it.
>  # Start 1 DN from 1st set of 3 DNs.
>  # Decommission 2 DNs from other set of EC DNs
> SCM logs when decommissioning is stuck
> {noformat}
> 4:58:30.828 PM    ERROR    UnderReplicatedProcessor    
> Error processing under replicated container ContainerInfo{id=#4, state=CLOSED, pipelineID=PipelineID=e0019753-3738-473b-96b5-2338ce586a18, stateEnterTime=2022-11-18T10:33:33.353Z, owner=om2}
> org.apache.hadoop.hdds.scm.exceptions.SCMException: Not enough healthy nodes to allocate container. 2  datanodes required. Found 1
>     at org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodesInternal(SCMCommonPlacementPolicy.java:218)
>     at org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRandom.chooseDatanodesInternal(SCMContainerPlacementRandom.java:81)
>     at org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:175)
>     at org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:117)
>     at org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.getTargetDatanodes(ECUnderReplicationHandler.java:303)
>     at org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.processAndCreateCommands(ECUnderReplicationHandler.java:186)
>     at org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.processUnderReplicatedContainer(ReplicationManager.java:471)
>     at org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processContainer(UnderReplicatedProcessor.java:99)
>     at org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processAll(UnderReplicatedProcessor.java:83)
>     at org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.run(UnderReplicatedProcessor.java:138)
>     at java.base/java.lang.Thread.run(Thread.java:834)
> 4:58:30.829 PM    ERROR    SCMCommonPlacementPolicy    
> Not enough healthy nodes to allocate container. 2  datanodes required. Found 1{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org