You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Attila Doroszlai (Jira)" <ji...@apache.org> on 2023/05/05 11:53:00 UTC

[jira] [Assigned] (HDDS-8539) Container DB open, but not found in DatanodeStoreCache

     [ https://issues.apache.org/jira/browse/HDDS-8539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Attila Doroszlai reassigned HDDS-8539:
--------------------------------------

    Assignee: Attila Doroszlai

> Container DB open, but not found in DatanodeStoreCache
> ------------------------------------------------------
>
>                 Key: HDDS-8539
>                 URL: https://issues.apache.org/jira/browse/HDDS-8539
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Datanode
>            Reporter: Attila Doroszlai
>            Assignee: Attila Doroszlai
>            Priority: Critical
>
> Surefire fork Intermittently timeouts in {{TestDecommissionAndMaintenance}}.
> Container DB added to cache:
> {code}
> 2023-05-03 08:18:26,909 [EndpointStateMachine task thread for /0.0.0.0:43723 - 0 ] INFO  utils.DatanodeStoreCache (DatanodeStoreCache.java:addDB(58)) - Added db /home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-ff176d5b-bea5-4cbe-a997-8236a6853a89/datanode-0/data-0/containers/hdds/ff176d5b-bea5-4cbe-a997-8236a6853a89/DS-4328e108-8c1a-4a6f-8bff-6f686dd50b24/container.db to cache
> {code}
> but then not found and tried to open again:
> {code}
> 2023-05-03 08:18:57,086 [Command processor thread] ERROR utils.DatanodeStoreCache (DatanodeStoreCache.java:getDB(74)) - Failed to get DB store /home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-ff176d5b-bea5-4cbe-a997-8236a6853a89/datanode-0/data-0/containers/hdds/ff176d5b-bea5-4cbe-a997-8236a6853a89/DS-4328e108-8c1a-4a6f-8bff-6f686dd50b24/container.db
> java.io.IOException: Failed init RocksDB, db path : /home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-ff176d5b-bea5-4cbe-a997-8236a6853a89/datanode-0/data-0/containers/hdds/ff176d5b-bea5-4cbe-a997-8236a6853a89/DS-4328e108-8c1a-4a6f-8bff-6f686dd50b24/container.db, exception :org.rocksdb.RocksDBException lock hold by current process, acquire time 1683101936 acquiring thread 139985634854656: /home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-ff176d5b-bea5-4cbe-a997-8236a6853a89/datanode-0/data-0/containers/hdds/ff176d5b-bea5-4cbe-a997-8236a6853a89/DS-4328e108-8c1a-4a6f-8bff-6f686dd50b24/container.db/LOCK: No locks available
> 	at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:182)
> 	at org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:212)
> 	at org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.start(AbstractDatanodeStore.java:147)
> 	at org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.<init>(AbstractDatanodeStore.java:99)
> 	at org.apache.hadoop.ozone.container.metadata.DatanodeStoreSchemaThreeImpl.<init>(DatanodeStoreSchemaThreeImpl.java:66)
> 	at org.apache.hadoop.ozone.container.common.utils.DatanodeStoreCache.getDB(DatanodeStoreCache.java:69)
> 	at org.apache.hadoop.ozone.container.keyvalue.helpers.BlockUtils.getDB(BlockUtils.java:132)
> 	at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.flushAndSyncDB(KeyValueContainer.java:444)
> 	at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.closeAndFlushIfNeeded(KeyValueContainer.java:385)
> 	at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.quasiClose(KeyValueContainer.java:355)
> 	at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.quasiCloseContainer(KeyValueHandler.java:1121)
> 	at org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.quasiCloseContainer(ContainerController.java:142)
> 	at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.notifyGroupRemove(ContainerStateMachine.java:1052)
> 	at org.apache.ratis.server.impl.RaftServerImpl.groupRemove(RaftServerImpl.java:423)
> 	at org.apache.ratis.server.impl.RaftServerProxy.lambda$groupRemoveAsync$12(RaftServerProxy.java:530)
> 	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
> 	at java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:628)
> 	at java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:1996)
> 	at org.apache.ratis.server.impl.RaftServerProxy.groupRemoveAsync(RaftServerProxy.java:529)
> 	at org.apache.ratis.server.impl.RaftServerProxy.groupManagementAsync(RaftServerProxy.java:479)
> 	at org.apache.ratis.server.impl.RaftServerProxy.groupManagement(RaftServerProxy.java:459)
> 	at org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.removeGroup(XceiverServerRatis.java:822)
> 	at org.apache.hadoop.ozone.container.common.statemachine.commandhandler.ClosePipelineCommandHandler.handle(ClosePipelineCommandHandler.java:77)
> 	at org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:99)
> 	at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$3(DatanodeStateMachine.java:644)
> 	at java.lang.Thread.run(Thread.java:750)
> {code}
> This continues until the fork is killed:
> {code}
> 2023-05-03 08:33:24,505 [Command processor thread] ERROR utils.DatanodeStoreCache (DatanodeStoreCache.java:getDB(74)) - Failed to get DB store /home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-ff176d5b-bea5-4cbe-a997-8236a6853a89/datanode-0/data-0/containers/hdds/ff176d5b-bea5-4cbe-a997-8236a6853a89/DS-4328e108-8c1a-4a6f-8bff-6f686dd50b24/container.db
> java.io.IOException: Failed init RocksDB, db path : /home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-ff176d5b-bea5-4cbe-a997-8236a6853a89/datanode-0/data-0/containers/hdds/ff176d5b-bea5-4cbe-a997-8236a6853a89/DS-4328e108-8c1a-4a6f-8bff-6f686dd50b24/container.db, exception :org.rocksdb.RocksDBException lock hold by current process, acquire time 1683101936 acquiring thread 139985634854656: /home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-ff176d5b-bea5-4cbe-a997-8236a6853a89/datanode-0/data-0/containers/hdds/ff176d5b-bea5-4cbe-a997-8236a6853a89/DS-4328e108-8c1a-4a6f-8bff-6f686dd50b24/container.db/LOCK: No locks available
> 	at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:182)
> 	at org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:212)
> 	at org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.start(AbstractDatanodeStore.java:147)
> 	at org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.<init>(AbstractDatanodeStore.java:99)
> 	at org.apache.hadoop.ozone.container.metadata.DatanodeStoreSchemaThreeImpl.<init>(DatanodeStoreSchemaThreeImpl.java:66)
> 	at org.apache.hadoop.ozone.container.common.utils.DatanodeStoreCache.getDB(DatanodeStoreCache.java:69)
> 	at org.apache.hadoop.ozone.container.keyvalue.helpers.BlockUtils.getDB(BlockUtils.java:132)
> 	at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.flushAndSyncDB(KeyValueContainer.java:444)
> 	at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.closeAndFlushIfNeeded(KeyValueContainer.java:385)
> 	at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.quasiClose(KeyValueContainer.java:355)
> {code}
> * https://github.com/adoroszlai/ozone-build-results/blob/master/2023/04/21/21757/it-flaky/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance-output.txt
> * https://github.com/adoroszlai/ozone-build-results/blob/master/2023/04/24/21800/it-flaky/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance-output.txt
> * https://github.com/adoroszlai/ozone-build-results/blob/master/2023/04/24/21805/it-flaky/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance-output.txt
> * https://github.com/adoroszlai/ozone-build-results/blob/master/2023/04/27/21885/it-flaky/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance-output.txt
> * https://github.com/adoroszlai/ozone-build-results/blob/master/2023/04/27/21895/it-flaky/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance-output.txt
> * https://github.com/adoroszlai/ozone-build-results/blob/master/2023/04/28/21927/it-flaky/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance-output.txt
> * https://github.com/adoroszlai/ozone-build-results/blob/master/2023/05/03/21994/it-flaky/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance-output.txt
> * https://github.com/adoroszlai/ozone-build-results/blob/master/2023/05/03/21995/it-flaky/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance-output.txt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org