You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Sammi Chen (Jira)" <ji...@apache.org> on 2021/04/06 04:03:00 UTC

[jira] [Comment Edited] (HDDS-5032) DN stopped to load containers on volume after a container load exception

    [ https://issues.apache.org/jira/browse/HDDS-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310485#comment-17310485 ] 

Sammi Chen edited comment on HDDS-5032 at 4/6/21, 4:02 AM:
-----------------------------------------------------------

Yes... I forgot to mention that the outcome of HDDS-4722 is a lot of missing containers which triggers a lot of container re-replication.


was (Author: jojochuang):
Yes... I forgot to mention that the outcome of HDFS-4722 is a lot of missing containers which triggers a lot of container re-replication.

> DN stopped to load containers on volume after a container load exception
> ------------------------------------------------------------------------
>
>                 Key: HDDS-5032
>                 URL: https://issues.apache.org/jira/browse/HDDS-5032
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Sammi Chen
>            Assignee: Sammi Chen
>            Priority: Critical
>
> We have met two cases of container loading exceptions, one case is fixed by HDDS-4722 which throws out Runtime Exception, another case is I backuped a container dirctory using name ContainerID-Backup which triggers bad formated container directory name exception. 
> The consequence of these two cases are the massive containers lefting on the same volume are not loaded. While DN is started and running healthly,  SCM treats all these container replicas as missing and starts to schedule many replica replication tasks. 
> This task is to fix the issue. If there is specific container loading exception, LOG it, and go to load next container. 
> Case 1:
> 2021-03-12 20:46:16,420 [Thread-8] ERROR org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader: Caught a Run time exception during reading container files from Volume /data3/hdds/hdds {}
> java.lang.NumberFormatException: For input string: "1823-raw"
>         at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>         at java.lang.Long.parseLong(Long.java:589)
>         at java.lang.Long.parseLong(Long.java:631)
>         at org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.getContainerID(ContainerUtils.java:242)
>         at org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.getContainerFile(ContainerUtils.java:234)
>         at org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.readVolume(ContainerReader.java:132)
>         at org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.run(ContainerReader.java:91)
>         at java.lang.Thread.run(Thread.java:748)
> Case2:
> 2021-03-25 10:15:47,502 [Thread-15] ERROR org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader: Caught a Run time exception during reading container files from Volume /data5/hdds/hdds {}
> org.apache.hadoop.metrics2.MetricsException: Metrics source RDBMetrics already exists!
>         at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>         at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>         at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>         at org.apache.hadoop.hdds.utils.db.RDBMetrics.create(RDBMetrics.java:47)
>         at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:152)
>         at org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:191)
>         at org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.start(AbstractDatanodeStore.java:128)
>         at org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.<init>(AbstractDatanodeStore.java:103)
>         at org.apache.hadoop.ozone.container.metadata.DatanodeStoreSchemaOneImpl.<init>(DatanodeStoreSchemaOneImpl.java:40)
>         at org.apache.hadoop.ozone.container.keyvalue.helpers.BlockUtils.getUncachedDatanodeStore(BlockUtils.java:68)
>         at org.apache.hadoop.ozone.container.keyvalue.helpers.BlockUtils.getUncachedDatanodeStore(BlockUtils.java:93)
>         at org.apache.hadoop.ozone.container.keyvalue.helpers.KeyValueContainerUtil.parseKVContainerData(KeyValueContainerUtil.java:195)
>         at org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.verifyAndFixupContainerData(ContainerReader.java:181)
>         at org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.verifyContainerFile(ContainerReader.java:158)
>         at org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.readVolume(ContainerReader.java:136)
>         at org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.run(ContainerReader.java:91)
>         at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org