You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Ethan Rose (Jira)" <ji...@apache.org> on 2022/01/28 20:17:00 UTC

[jira] [Updated] (HDDS-6236) SCM receives reports of unknown containers

     [ https://issues.apache.org/jira/browse/HDDS-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ethan Rose updated HDDS-6236:
-----------------------------
    Description: 
We have noticed the following log messages in SCM leader and followers for multiple containers:
{code:java}
2022-01-19 12:53:24,021 ERROR org.apache.hadoop.hdds.scm.container.ContainerReportHandler: Received container report for an unknown container 1368 from datanode \{ ... }

org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: ID #1368
        at org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.lambda$getContainer$0(ContainerManagerImpl.java:147)
        at java.base/java.util.Optional.orElseThrow(Optional.java:408)
        at org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.getContainer(ContainerManagerImpl.java:147)
        at org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:94)
        at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:165)
        at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:133)
        at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:48)
        at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)

{code}
The cluster is currently running SCM HA, but the issue was observed when it was a non-HA cluster as well. This seems to only affect empty containers, since no data appears to be missing. Containers are supposed to exist in SCM DB even after they have been deleted from the datanode, so there seems to be some kind of bug in the container persistence logic.

  was:
In two different Ozone clusters running SCM HA (may or may not be related to HA), we have noticed the following log messages in SCM leader and followers:

{code}

2022-01-19 12:53:24,021 ERROR org.apache.hadoop.hdds.scm.container.ContainerReportHandler: Received container report for an unknown container 1368 from datanode \{ ... }

org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: ID #1368
        at org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.lambda$getContainer$0(ContainerManagerImpl.java:147)
        at java.base/java.util.Optional.orElseThrow(Optional.java:408)
        at org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.getContainer(ContainerManagerImpl.java:147)
        at org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:94)
        at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:165)
        at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:133)
        at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:48)
        at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)

{code}

This seems only affect empty containers, since no data appears to be missing. Containers are supposed to exist in SCM DB even after they have been deleted from the datanode, so there is some kind of bug in the container persistence logic.


> SCM receives reports of unknown containers 
> -------------------------------------------
>
>                 Key: HDDS-6236
>                 URL: https://issues.apache.org/jira/browse/HDDS-6236
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM
>            Reporter: Ethan Rose
>            Priority: Major
>
> We have noticed the following log messages in SCM leader and followers for multiple containers:
> {code:java}
> 2022-01-19 12:53:24,021 ERROR org.apache.hadoop.hdds.scm.container.ContainerReportHandler: Received container report for an unknown container 1368 from datanode \{ ... }
> org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: ID #1368
>         at org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.lambda$getContainer$0(ContainerManagerImpl.java:147)
>         at java.base/java.util.Optional.orElseThrow(Optional.java:408)
>         at org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.getContainer(ContainerManagerImpl.java:147)
>         at org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:94)
>         at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:165)
>         at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:133)
>         at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:48)
>         at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
>         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:834)
> {code}
> The cluster is currently running SCM HA, but the issue was observed when it was a non-HA cluster as well. This seems to only affect empty containers, since no data appears to be missing. Containers are supposed to exist in SCM DB even after they have been deleted from the datanode, so there seems to be some kind of bug in the container persistence logic.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org