You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Janus Chow (Jira)" <ji...@apache.org> on 2023/12/16 23:44:06 UTC

[jira] [Updated] (HDDS-8271) Ozone Manager crashed caused by rocksdb closed

     [ https://issues.apache.org/jira/browse/HDDS-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Janus Chow updated HDDS-8271:
-----------------------------
    Target Version/s: 1.5.0  (was: 1.4.0)

I am managing the 1.4.0 release and we currently have more than 500 issues targeted for 1.4.0. I am moving the target field to 1.5.0.
       
If you are actively working on this jira and believe this should be targeted for the 1.4.0 release, Please reach out to me via Apache email or Slack.

> Ozone Manager crashed caused by rocksdb closed
> ----------------------------------------------
>
>                 Key: HDDS-8271
>                 URL: https://issues.apache.org/jira/browse/HDDS-8271
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: OM
>    Affects Versions: 1.3.0
>            Reporter: Hongbing Wang
>            Assignee: Sadanand Shenoy
>            Priority: Critical
>
> An OM crashed, its log has:
> {noformat}
> 2023-03-23 20:32:06,079 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: om3@group-197E298202B9-SegmentedRaftLogWorker: Rolled log segment from /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61913753 to /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_61913753-61920134
> 2023-03-23 20:32:06,087 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: om3@group-197E298202B9-SegmentedRaftLogWorker: created new log segment /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61920135
> 2023-03-23 20:32:13,820 [om3-client-thread180] INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: om3@group-197E298202B9-SegmentedRaftLogWorker: Rolling segment log-61920135_61926451 to index:61926451
> 2023-03-23 20:32:13,821 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: om3@group-197E298202B9-SegmentedRaftLogWorker: Rolled log segment from /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61920135 to /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_61920135-61926451
> 2023-03-23 20:32:13,827 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: om3@group-197E298202B9-SegmentedRaftLogWorker: created new log segment /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61926452
> 2023-03-23 20:32:15,699 [Finalizer] WARN org.apache.hadoop.hdds.utils.db.managed.ManagedRocksObjectUtils: RocksIterator is not closed properly
> 2023-03-23 20:32:15,699 [Finalizer] WARN org.apache.hadoop.hdds.utils.db.managed.ManagedRocksObjectUtils: RocksIterator is not closed properly
> 2023-03-23 20:32:21,281 [om3-client-thread183] INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: om3@group-197E298202B9-SegmentedRaftLogWorker: Rolling segment log-61926452_61932758 to index:61932758
> 2023-03-23 20:32:21,281 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: om3@group-197E298202B9-SegmentedRaftLogWorker: Rolled log segment from /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61926452 to /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_61926452-61932758
> 2023-03-23 20:32:21,288 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: om3@group-197E298202B9-SegmentedRaftLogWorker: created new log segment /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61932759
> 2023-03-23 20:32:28,387 [om3-client-thread160] INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: om3@group-197E298202B9-SegmentedRaftLogWorker: Rolling segment log-61932759_61939170 to index:61939170
> 2023-03-23 20:32:28,388 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: om3@group-197E298202B9-SegmentedRaftLogWorker: Rolled log segment from /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61932759 to /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_61932759-61939170
> 2023-03-23 20:32:28,395 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: om3@group-197E298202B9-SegmentedRaftLogWorker: created new log segment /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61939171
> 2023-03-23 20:32:29,003 [OM StateMachine ApplyTransaction Thread - 0] ERROR org.apache.hadoop.ozone.om.KeyManagerImpl: CheckAccess operation failed for key:vol-XXX/prod-XXX/FILE_2022-03-28-13-38-36_10003/_2022-03-28-13-38-45_0.FILE
> java.io.IOException: Rocks Database is closed
> 	at org.apache.hadoop.hdds.utils.db.RocksDatabase.assertClose(RocksDatabase.java:395)
> 	at org.apache.hadoop.hdds.utils.db.RocksDatabase.get(RocksDatabase.java:517)
> 	at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:110)
> 	at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:40)
> 	at org.apache.hadoop.hdds.utils.db.TypedTable.getFromTable(TypedTable.java:255)
> 	at org.apache.hadoop.hdds.utils.db.TypedTable.get(TypedTable.java:181)
> 	at org.apache.hadoop.ozone.om.KeyManagerImpl.checkAccess(KeyManagerImpl.java:1037)
> 	at org.apache.hadoop.ozone.security.acl.OzoneNativeAuthorizer.checkAccess(OzoneNativeAuthorizer.java:164)
> 	at org.apache.hadoop.ozone.om.OzoneManager.checkAcls(OzoneManager.java:2584)
> 	at org.apache.hadoop.ozone.om.OzoneManager.checkAcls(OzoneManager.java:2570)
> 	at org.apache.hadoop.ozone.om.OzoneAclUtils.checkAllAcls(OzoneAclUtils.java:128)
> 	at org.apache.hadoop.ozone.om.request.OMClientRequest.checkAcls(OMClientRequest.java:360)
> 	at org.apache.hadoop.ozone.om.request.OMClientRequest.checkAcls(OMClientRequest.java:215)
> 	at org.apache.hadoop.ozone.om.request.key.OMKeyRequest.checkKeyAcls(OMKeyRequest.java:339)
> 	at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:138)
> 	at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:311)
> 	at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:533)
> 	at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:324)
> 	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> 2023-03-23 20:32:29,004 [OM StateMachine ApplyTransaction Thread - 0] ERROR org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: MultipartUpload Commit is failed for Key:FILE_2022-03-28-13-38-36_10003/_2022-03-28-13-38-45_0.FILE in Volume/Bucket vol-XXX/prod-XXX
> INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Check access operation failed for key:FILE_2022-03-28-13-38-36_10003/_2022-03-28-13-38-45_0.FILE
> 	at org.apache.hadoop.ozone.om.KeyManagerImpl.checkAccess(KeyManagerImpl.java:1083)
> 	at org.apache.hadoop.ozone.security.acl.OzoneNativeAuthorizer.checkAccess(OzoneNativeAuthorizer.java:164)
> 	at org.apache.hadoop.ozone.om.OzoneManager.checkAcls(OzoneManager.java:2584)
> 	at org.apache.hadoop.ozone.om.OzoneManager.checkAcls(OzoneManager.java:2570)
> 	at org.apache.hadoop.ozone.om.OzoneAclUtils.checkAllAcls(OzoneAclUtils.java:128)
> 	at org.apache.hadoop.ozone.om.request.OMClientRequest.checkAcls(OMClientRequest.java:360)
> 	at org.apache.hadoop.ozone.om.request.OMClientRequest.checkAcls(OMClientRequest.java:215)
> 	at org.apache.hadoop.ozone.om.request.key.OMKeyRequest.checkKeyAcls(OMKeyRequest.java:339)
> 	at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:138)
> 	at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:311)
> 	at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:533)
> 	at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:324)
> 	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Rocks Database is closed
> 	at org.apache.hadoop.hdds.utils.db.RocksDatabase.assertClose(RocksDatabase.java:395)
> 	at org.apache.hadoop.hdds.utils.db.RocksDatabase.get(RocksDatabase.java:517)
> 	at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:110)
> 	at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:40)
> 	at org.apache.hadoop.hdds.utils.db.TypedTable.getFromTable(TypedTable.java:255)
> 	at org.apache.hadoop.hdds.utils.db.TypedTable.get(TypedTable.java:181)
> 	at org.apache.hadoop.ozone.om.KeyManagerImpl.checkAccess(KeyManagerImpl.java:1037)
> 	... 15 more
> 2023-03-23 20:32:29,012 [OMDoubleBufferFlushThread] ERROR org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer: Terminating with exit status 1: During flush to DB encountered error in OMDoubleBuffer flush thread OMDoubleBufferFlushThread
> java.io.IOException: Rocks Database is closed
> 	at org.apache.hadoop.hdds.utils.db.RocksDatabase$ColumnFamily.assertClosed(RocksDatabase.java:310)
> 	at org.apache.hadoop.hdds.utils.db.RocksDatabase$ColumnFamily.batchPut(RocksDatabase.java:293)
> 	at org.apache.hadoop.hdds.utils.db.RDBBatchOperation.put(RDBBatchOperation.java:62)
> 	at org.apache.hadoop.hdds.utils.db.RDBTable.putWithBatch(RDBTable.java:77)
> 	at org.apache.hadoop.hdds.utils.db.RDBTable.putWithBatch(RDBTable.java:40)
> 	at org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:132)
> 	at org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:149)
> 	at org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.checkAndUpdateDB(S3MultipartUploadCommitPartResponse.java:117)
> 	at org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$2(OzoneManagerDoubleBuffer.java:283)
> 	at org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.addToBatchWithTrace(OzoneManagerDoubleBuffer.java:228)
> 	at org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$1(OzoneManagerDoubleBuffer.java:281)
> 	at java.util.Iterator.forEachRemaining(Iterator.java:116)
> 	at org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:277)
> 	at java.lang.Thread.run(Thread.java:745)
> 2023-03-23 20:32:29,016 [shutdown-hook-0] INFO org.apache.hadoop.ozone.om.OzoneManagerStarter: SHUTDOWN_MSG: 
> /************************************************************
> SHUTDOWN_MSG: Shutting down OzoneManager at bigdata-hostxxx.com/10.xx.xx.xx
> ************************************************************/
> {noformat}
> more OM log see: [OM crashed detail log |https://github.com/whbing/issue_logs/blob/main/ozone/om/om_crash.20230323.log]
> rocksdb log: LOG.old.1679574802031612
> {noformat}
> 2023/03/23-20:32:26.278546 140423117969152 [/db_impl/db_impl_files.cc:445] [JOB 4050] Try to delete WAL files size 42545044203, prev total WAL file size 42958842785, number of live WAL files 326.
> ...
> 2023/03/23-20:32:28.853876 140423224928000 [/version_set.cc:3800] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
> ...
> 2023/03/23-20:32:28.854153 140423224928000 (Original Log Time 2023/03/23-20:32:28.854112) [/db_impl/db_impl_compaction_flush.cc:284] [multipartInfoTable] Level summary: base level 6 level multiplier 10.00 max bytes base 268435456 files[2 0 0 0 0 0 3] max score 0.50
> ...
> 2023/03/23-20:32:28.998765 140423117969152 [le/delete_scheduler.cc:77] Deleted file /data/ozonedata/om/db/om.db/098324.log immediately, rate_bytes_per_sec 0, total_trash_size 0 max_trash_db_ratio 0.250000
> 2023/03/23-20:32:29.009957 140422757758720 [/db_impl/db_impl.cc:480] Shutdown: canceling all background work
> 2023/03/23-20:32:29.020213 140423117969152 [le/delete_scheduler.cc:77] Deleted file /data/ozonedata/om/db/om.db/098322.log immediately, rate_bytes_per_sec 0, total_trash_size 0 max_trash_db_ratio 0.250000
> ...
> 2023/03/23-20:32:32.829353 140423117969152 [le/delete_scheduler.cc:77] Deleted file /data/ozonedata/om/db/om.db/097921.log immediately, rate_bytes_per_sec 0, total_trash_size 0 max_trash_db_ratio 0.250000
> 2023/03/23-20:32:32.883735 140422757758720 [/db_impl/db_impl.cc:710] Shutdown complete
> {noformat}
> more see : [detail rocks log|https://github.com/whbing/issue_logs/blob/main/ozone/om/LOG.old.1679574802031612]
> core file: hs_err_pid1017701.log :
> {noformat}
> Stack: [0x00007fb6b8883000,0x00007fb6b8984000],  sp=0x00007fb6b89817f8,  free space=1017k
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> j  org.rocksdb.RocksDB.getLatestSequenceNumber(J)J+0
> j  org.rocksdb.RocksDB.getLatestSequenceNumber()J+5
> j  org.apache.hadoop.hdds.utils.db.RocksDatabase.getLatestSequenceNumber()J+18
> j  org.apache.hadoop.hdds.utils.db.RDBStore.getUpdatesSince(JJ)Lorg/apache/hadoop/hdds/utils/db/DBUpdatesWrapper;+395
> j  org.apache.hadoop.ozone.om.OzoneManager.getDBUpdates(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$DBUpdatesRequest;)Lorg/apache/hadoop/ozone/om/helpers/DBUpdates;+30
> j  org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.getOMDBUpdates(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$DBUpdatesRequest;)Lorg/apache/hadoop/ozone/p
> rotocol/proto/OzoneManagerProtocolProtos$DBUpdatesResponse;+9
> J 19759 C2 org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleReadRequest(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMRequest;)Lorg/apache/hadoop/ozo
> ne/protocol/proto/OzoneManagerProtocolProtos$OMResponse; (840 bytes) @ 0x00007fc99793523c [0x00007fc997933ec0+0x137c]
> J 19799 C2 org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerP
> rotocolProtos$OMRequest;)Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMResponse; (81 bytes) @ 0x00007fc997dec564 [0x00007fc997deb460+0x1104]
> J 17937 C2 org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(Lcom/google/protobuf/Descriptors$MethodDescriptor;Lcom/google/protobuf/Rp
> cController;Lcom/google/protobuf/Message;)Lcom/google/protobuf/Message; (69 bytes) @ 0x00007fc995ec62a0 [0x00007fc995ec6140+0x160]
> J 17503 C2 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Lorg/apache/hadoop/ipc/RPC$Server;Ljava/lang/String;Lorg/apache/hadoop/io/Writable;J)Lorg/apache/hadoop/io/Wr
> itable; (830 bytes) @ 0x00007fc99714df78 [0x00007fc99714c9c0+0x15b8]
> J 17497 C2 org.apache.hadoop.ipc.Server$RpcCall.run()Ljava/lang/Object; (5 bytes) @ 0x00007fc997237f70 [0x00007fc997237c60+0x310]
> v  ~StubRoutines::call_stub
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org