You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Wei-Chiu Chuang (Jira)" <ji...@apache.org> on 2021/01/18 20:54:00 UTC

[jira] [Created] (HDDS-4722) Creating RDBStore fails due to RDBMetrics instance race

Wei-Chiu Chuang created HDDS-4722:
-------------------------------------

             Summary:  Creating RDBStore fails due to RDBMetrics instance race
                 Key: HDDS-4722
                 URL: https://issues.apache.org/jira/browse/HDDS-4722
             Project: Hadoop Distributed Data Store
          Issue Type: Bug
          Components: Ozone Datanode
    Affects Versions: 1.0.0
            Reporter: Wei-Chiu Chuang


I am using Ozone APIs to create containers, and it occasionally aborts due to a data race in acessing the RBDMetric instance:
{noformat}
2021-01-09 02:39:36,944 [pool-1-thread-4] INFO keyvalue.KeyValueContainer: Container 318054 is closed with bcsId 0.
2021-01-09 02:39:36,988 [pool-1-thread-17] ERROR freon.BaseFreonGenerator: Error on executing task 318048
com.google.common.util.concurrent.UncheckedExecutionException: org.apache.hadoop.metrics2.MetricsException: Metrics source RDBMetrics already exists!
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
        at com.google.common.cache.LocalCache.get(LocalCache.java:3951)
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)
        at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958)
        at org.apache.hadoop.ozone.freon.ContainerGenerator.lambda$writeContainer$1(ContainerGenerator.java:489)
        at com.codahale.metrics.Timer.time(Timer.java:101)
        at org.apache.hadoop.ozone.freon.ContainerGenerator.writeContainer(ContainerGenerator.java:485)
        at org.apache.hadoop.ozone.freon.BaseFreonGenerator.tryNextTask(BaseFreonGenerator.java:189)
        at org.apache.hadoop.ozone.freon.BaseFreonGenerator.taskLoop(BaseFreonGenerator.java:169)
        at org.apache.hadoop.ozone.freon.BaseFreonGenerator.lambda$startTaskRunners$0(BaseFreonGenerator.java:152)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.hadoop.metrics2.MetricsException: Metrics source RDBMetrics already exists!
        at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
        at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
        at org.apache.hadoop.hdds.utils.db.RDBMetrics.create(RDBMetrics.java:47)
        at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:152)
        at org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:191)
        at org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.start(AbstractDatanodeStore.java:128)
        at org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.<init>(AbstractDatanodeStore.java:103)
        at org.apache.hadoop.ozone.container.metadata.DatanodeStoreSchemaTwoImpl.<init>(DatanodeStoreSchemaTwoImpl.java:48)
        at org.apache.hadoop.ozone.container.keyvalue.helpers.KeyValueContainerUtil.createContainerMetaData(KeyValueContainerUtil.java:112)
        at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.create(KeyValueContainer.java:133)
        at org.apache.hadoop.ozone.freon.ContainerGenerator.createContainer(ContainerGenerator.java:463)
        at org.apache.hadoop.ozone.freon.ContainerGenerator.access$100(ContainerGenerator.java:109)
        at org.apache.hadoop.ozone.freon.ContainerGenerator$ContainerCreator.load(ContainerGenerator.java:357)
        at org.apache.hadoop.ozone.freon.ContainerGenerator$ContainerCreator.load(ContainerGenerator.java:353)
        at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3529)
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2278)
{noformat}

Looking at the code, I believe RDBMetrics#unRegister() should be made synchronized. Otherwise create and close RDBStore objects could lead to race of the RDBMetrics instance object.

After making RDBMetrics#unRegister() synchronized, the tool no longer aborts due to the race.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org