You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Marton Elek (Jira)" <ji...@apache.org> on 2020/04/06 12:05:00 UTC

[jira] [Updated] (HDDS-3241) Invalid container reported to SCM should be deleted

     [ https://issues.apache.org/jira/browse/HDDS-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marton Elek updated HDDS-3241:
------------------------------
    Fix Version/s: 0.5.0
       Resolution: Fixed
           Status: Resolved  (was: Patch Available)

> Invalid container reported to SCM should be deleted
> ---------------------------------------------------
>
>                 Key: HDDS-3241
>                 URL: https://issues.apache.org/jira/browse/HDDS-3241
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>    Affects Versions: 0.4.1
>            Reporter: Yiqun Lin
>            Assignee: Yiqun Lin
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.5.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> For the invalid or out-updated container reported by Datanode, ContainerReportHandler in SCM only prints error log and doesn't 
>  take any action.
> {noformat}
> 2020-03-15 05:19:41,072 ERROR org.apache.hadoop.hdds.scm.container.ContainerReportHandler: Received container report for an unknown container 37 from datanode 0d98dfab-9d34-46c3-93fd-6b64b65ff543{ip: xx.xx.xx.xx, host: lyq-xx.xx.xx.xx, networkLocation: /dc2/rack1, certSerialId: null}.
> org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: Container with id #37 not found.
>         at org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.checkIfContainerExist(ContainerStateMap.java:542)
>         at org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.getContainerInfo(ContainerStateMap.java:188)
>         at org.apache.hadoop.hdds.scm.container.ContainerStateManager.getContainer(ContainerStateManager.java:484)
>         at org.apache.hadoop.hdds.scm.container.SCMContainerManager.getContainer(SCMContainerManager.java:204)
>         at org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:85)
>         at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:126)
>         at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:97)
>         at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:46)
>         at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> 2020-03-15 05:19:41,073 ERROR org.apache.hadoop.hdds.scm.container.ContainerReportHandler: Received container report for an unknown container 38 from datanode 0d98dfab-9d34-46c3-93fd-6b64b65ff543{ip: xx.xx.xx.xx, host: lyq-xx.xx.xx.xx, networkLocation: /dc2/rack1, certSerialId: null}.
> org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: Container with id #38 not found.
>         at org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.checkIfContainerExist(ContainerStateMap.java:542)
>         at org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.getContainerInfo(ContainerStateMap.java:188)
>         at org.apache.hadoop.hdds.scm.container.ContainerStateManager.getContainer(ContainerStateManager.java:484)
>         at org.apache.hadoop.hdds.scm.container.SCMContainerManager.getContainer(SCMContainerManager.java:204)
>         at org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:85)
>         at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:126)
>         at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:97)
>         at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:46)
>         at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> {noformat}
> Actually SCM should inform Datanode to delete its outdated container. Otherwise, Datanode will always report this invalid container and this dirty container data will be always kept in Datanode. Sometimes, we bring back a node that be repaired and it maybe stores stale data and we should have a way to auto-cleanup them.
> We could have a setting to control this auto-deletion behavior if this is a little risk approach.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org