You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Ajay Kumar (JIRA)" <ji...@apache.org> on 2019/03/22 23:22:00 UTC

[jira] [Reopened] (HDDS-1310) In datanode once a container becomes unhealthy, datanode restart fails.

     [ https://issues.apache.org/jira/browse/HDDS-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ajay Kumar reopened HDDS-1310:
------------------------------

Tests failures seems to be related. Lets fix them before commit.

> In datanode once a container becomes unhealthy, datanode restart fails.
> -----------------------------------------------------------------------
>
>                 Key: HDDS-1310
>                 URL: https://issues.apache.org/jira/browse/HDDS-1310
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: Ozone Datanode
>    Affects Versions: 0.3.0
>            Reporter: Sandeep Nemuri
>            Assignee: Sandeep Nemuri
>            Priority: Blocker
>         Attachments: HDDS-1310.001.patch, HDDS-1310.002.patch
>
>
> When a container is marked as {{UNHEALTHY}} in a datanode, subsequent restart of that datanode fails as it cannot generate ContainerReports anymore. Unhealthy state of a container is not handled in ContainerReport generation inside a datanode.
> We get the below exception when a datanode tries to generate the ContainerReport which contains unhealthy container(s)
> {noformat}
> 2019-03-19 13:51:13,646 [Datanode State Machine Thread - 0] ERROR      - Unable to communicate to SCM server at xxxxx.xxxxx.xxx:9861 for past 3300 seconds.
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: Invalid Container state found: 86
>         at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.getHddsState(KeyValueContainer.java:623)
>         at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.getContainerReport(KeyValueContainer.java:593)
>         at org.apache.hadoop.ozone.container.common.impl.ContainerSet.getContainerReport(ContainerSet.java:204)
>         at org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.getContainerReport(ContainerController.java:82)
>         at org.apache.hadoop.ozone.container.common.states.endpoint.RegisterEndpointTask.call(RegisterEndpointTask.java:114)
>         at org.apache.hadoop.ozone.container.common.states.endpoint.RegisterEndpointTask.call(RegisterEndpointTask.java:47)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org