You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Devesh Kumar Singh (Jira)" <ji...@apache.org> on 2024/01/02 12:54:00 UTC

[jira] [Commented] (HDDS-10017) [Timer for 'Recon' metrics system] Rocks Database is closed

    [ https://issues.apache.org/jira/browse/HDDS-10017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801772#comment-17801772 ] 

Devesh Kumar Singh commented on HDDS-10017:
-------------------------------------------

[~conway] Based on above exception trace, "{*}scm.snapshot.db*{*}" snapshot file gets created when Recon was down while SCM and all DNs were up and Recon restarted after a while which can create difference of containers between SCM and Recon. If the difference between SCM container count and Recon container count > 100, then Recon pulls the SCM metadata db and keep a copy at Recon node. But the file will be renamed later as *recon-scm.db ,* so with happy and positive flow I am not able to reproduce the issue. Exception in above logs has come out of PipelineSyncTask when it tries to update the pipeline state in pipeline RDB table with reference of new RocksDB store snapshot file  "{*}scm.snapshot.db*{*}".

Can you check if "{*}/ozonedata/recon/metadata/scm.snapshot.db_1703562071086/001042.log{*}" exists in your cluster ? If not try restart your cluster, ideally this file will not be created by Recon on its own.

 

> [Timer for 'Recon' metrics system] Rocks Database is closed
> -----------------------------------------------------------
>
>                 Key: HDDS-10017
>                 URL: https://issues.apache.org/jira/browse/HDDS-10017
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Recon
>    Affects Versions: 1.4.0
>            Reporter: Conway Zhang
>            Assignee: Devesh Kumar Singh
>            Priority: Major
>
> There has a error when I using Recon about Rocks Database is closed:
> {code:java}
> 2023-12-26 17:36:53,385 [PipelineSyncTask] INFO org.apache.hadoop.ozone.recon.scm.ReconPipelineManager: Adding new pipeline PipelineID=40552754-9300-..-3aadeaf41348 from SCM. 2023-12-26 17:36:54,556 [PipelineSyncTask] ERROR org.apache.hadoop.hdds.scm.pipeline.PipelineStateManagerImpl: Pipeline PipelineID=fc776cf8-43d1-494a-..-c930456905eb state update failed 2023-12-26 17:36:54,568 [PipelineSyncTask] ERROR org.apache.hadoop.ozone.recon.scm.PipelineSyncTask: Exception in Pipeline sync Thread. org.apache.hadoop.hdds.scm.exceptions.SCMException: org.apache.ratis.protocol.exceptions.StateMachineException: java.io.IOException from Server peer@group-075CE2E08D2E: RocksDatabase[/ozonedata/recon/metadata/scm.snapshot.db_1703562071086]: Failed to put �wl�C�IJ�4�0Ei^E�; status : IOError(Undefined); message : While open a file for appending: /ozonedata/recon/metadata/scm.snapshot.db_1703562071086/001042.log: No such file or directory at org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.translateException(SCMHAInvocationHandler.java:165) at org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:115) at org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:74) at com.sun.proxy.$Proxy48.updatePipelineState(Unknown Source) at org.apache.hadoop.ozone.recon.scm.ReconPipelineManager.initializePipelines(ReconPipelineManager.java:114) at org.apache.hadoop.ozone.recon.scm.PipelineSyncTask.triggerPipelineSyncTask(PipelineSyncTask.java:92) at org.apache.hadoop.ozone.recon.scm.PipelineSyncTask.run(PipelineSyncTask.java:75) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.ratis.protocol.exceptions.StateMachineException: java.io.IOException from Server peer@group-075CE2E08D2E: RocksDatabase[/ozonedata/recon/metadata/scm.snapshot.db_1703562071086]: Failed to put �wl�C�IJ�4�0Ei^E�; status : IOError(Undefined); message : While open a file for appending: /ozonedata/recon/metadata/scm.snapshot.db_1703562071086/001042.log: No such file or directory at org.apache.hadoop.hdds.scm.ha.SCMHAManagerStub$RatisServerStub.submitRequest(SCMHAManagerStub.java:199) at org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatisServer(SCMHAInvocationHandler.java:123) at org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:112) ... 6 more Caused by: java.io.IOException: RocksDatabase[/ozonedata/recon/metadata/scm.snapshot.db_1703562071086]: Failed to put �wl�C�IJ�4�0Ei^E�; status : IOError(Undefined); message : While open a file for appending: /ozonedata/recon/metadata/scm.snapshot.db_1703562071086/001042.log: No such file or directory at org.apache.hadoop.hdds.utils.HddsServerUtil.toIOException(HddsServerUtil.java:667) at org.apache.hadoop.hdds.utils.db.RocksDatabase.toIOException(RocksDatabase.java:98) at org.apache.hadoop.hdds.utils.db.RocksDatabase.put(RocksDatabase.java:501) at org.apache.hadoop.hdds.utils.db.RDBTable.put(RDBTable.java:70) at org.apache.hadoop.hdds.utils.db.TypedTable.put(TypedTable.java:156) at org.apache.hadoop.hdds.scm.metadata.SCMDBTransactionBufferImpl.addToBuffer(SCMDBTransactionBufferImpl.java:36) at org.apache.hadoop.hdds.scm.pipeline.PipelineStateManagerImpl.updatePipelineState(PipelineStateManagerImpl.java:296) at sun.reflect.GeneratedMethodAccessor328.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hdds.scm.ha.SCMHAManagerStub$RatisServerStub.process(SCMHAManagerStub.java:229) at org.apache.hadoop.hdds.scm.ha.SCMHAManagerStub$RatisServerStub.submitRequest(SCMHAManagerStub.java:191) ... 8 more Caused by: org.rocksdb.RocksDBException: While open a file for appending: /ozonedata/recon/metadata/scm.snapshot.db_1703562071086/001042.log: No such file or directory at org.rocksdb.RocksDB.putDirect(Native Method) at org.rocksdb.RocksDB.put(RocksDB.java:981) at org.apache.hadoop.hdds.utils.db.RocksDatabase.put(RocksDatabase.java:498) ... 17 more 2023-12-26 17:36:57,768 [pool-51-thread-1] INFO org.apache.hadoop.ozone.recon.scm.ReconStorageContainerManagerFacade: Got list of containers from SCM : 128 2023-12-26 17:37:01,032 [Timer for 'Recon' metrics system] ERROR org.apache.hadoop.hdds.utils.RocksDBStoreMetrics: Failed to get property mem-table-flush-pending from rocksdb java.io.IOException: Rocks Database is closed at org.apache.hadoop.hdds.utils.db.RocksDatabase.assertClose(RocksDatabase.java:458) at org.apache.hadoop.hdds.utils.db.RocksDatabase.getProperty(RocksDatabase.java:822) at org.apache.hadoop.hdds.utils.RocksDBStoreMetrics.getDBPropertyData(RocksDBStoreMetrics.java:214) at org.apache.hadoop.hdds.utils.RocksDBStoreMetrics.getMetrics(RocksDBStoreMetrics.java:151) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:423) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics(MetricsSystemImpl.java:410) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.onTimerEvent(MetricsSystemImpl.java:385) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run(MetricsSystemImpl.java:372) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org