You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/09/25 09:32:00 UTC

[jira] [Commented] (QPID-8470) Deadlock / latch timeout in BDB database

    [ https://issues.apache.org/jira/browse/QPID-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17202025#comment-17202025 ] 

ASF GitHub Bot commented on QPID-8470:
--------------------------------------

dakirily opened a new pull request #61:
URL: https://github.com/apache/qpid-broker-j/pull/61


   Added retrying logic for storing message metadata.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Deadlock  / latch timeout in BDB database
> -----------------------------------------
>
>                 Key: QPID-8470
>                 URL: https://issues.apache.org/jira/browse/QPID-8470
>             Project: Qpid
>          Issue Type: Bug
>          Components: Broker-J
>    Affects Versions: qpid-java-broker-8.0.1
>            Reporter: Daniil Kirilyuk
>            Priority: Minor
>         Attachments: JE-thread-dump.txt
>
>
> There was deadlock (or [latch timeout|[https://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/EnvironmentConfig.html#ENV_LATCH_TIMEOUT]) in BDB database or latch was hold for more then timeout. Then was database environment closed, database transactions cancelled and broker initiated shutdown.
> {noformat}
> 2020-08-19 00:42:41,955 ERROR [IO-/127.0.0.1:55550] (c.s.j.d.EnvironmentImpl) - default Thread dump follows for latch timeout: BIN2605191 currentThread: Thread[IO-/127.0.0.1:55550,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55522,5,main]
> 2020-08-19 00:42:41,955 ERROR [IO-/127.0.0.1:55580] (c.s.j.d.EnvironmentImpl) - default Thread dump follows for latch timeout: BIN2605060 currentThread: Thread[IO-/127.0.0.1:55580,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55584,5,main]
> 2020-08-19 00:42:41,955 ERROR [IO-/127.0.0.1:55544] (c.s.j.d.EnvironmentImpl) - default Thread dump follows for latch timeout: BIN2605128 currentThread: Thread[IO-/127.0.0.1:55544,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55548,5,main]
> ....
> 2020-08-19 00:42:42,174 ERROR [IO-/127.0.0.1:55538] (o.a.q.s.u.ServerScopedRuntimeException) - Error reading message metadata for message with id 52671357: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages Latch timeout. BIN2605191 currentThread: Thread[IO-/127.0.0.1:55538,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55522,5,main] UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. Environment is invalid and must be closed.
> org.apache.qpid.server.logging.logback.Logback1027WorkaroundTurboFilter$StringifiedException: com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages Latch timeout. BIN2605191 currentThread: Thread[IO-/127.0.0.1:55538,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55522,5,main] UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. Environment is invalid and must be closed.
>         at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:443)
>         at com.sleepycat.je.latch.LatchSupport.handleTimeout(LatchSupport.java:211)
> ....
>                 at org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496)
>                 at org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270)
>                 at org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134)
>                 at org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575)
>                 at org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366)
>                 at org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97)
>                 ... 5 more
>                 Suppressed: com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages Latch timeout. BIN2605176 currentThread: Thread[IO-/127.0.0.1:55496,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55494,5,main] UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. Environment is invalid and must be closed.
>                         at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:228)
>                         at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1766)
>                         at com.sleepycat.je.dbi.CursorImpl.checkEnv(CursorImpl.java:3869)
>                         at com.sleepycat.je.Cursor.checkEnv(Cursor.java:5394)
>                         at com.sleepycat.je.Cursor.close(Cursor.java:540)
>                         at com.sleepycat.je.Database.get(Database.java:1344)
>                         ... 41 more
>         [CIRCULAR REFERENCE:com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages Latch timeout. BIN2605176 currentThread: Thread[IO-/127.0.0.1:55496,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55494,5,main] UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. Environment is invalid and must be closed.]
> End of stringified Stacktrace
> ...
> 2020-08-19 00:42:42,201 ERROR [IO-/127.0.0.1:55498] (o.a.q.s.Main) - Uncaught exception, shutting down.
> org.apache.qpid.server.logging.logback.Logback1027WorkaroundTurboFilter$StringifiedException: org.apache.qpid.server.util.ServerScopedRuntimeException: org.apache.qpid.server.store.StoreException: Error reading message metadata for message with id 52671021: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages Latch timeout. BIN2605177 currentThread: Thread[IO-/127.0.0.1:55498,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55554,5,main] UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. Environment is invalid and must be closed.
>         at org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:582)
>         at java.security.AccessController.doPrivileged(Native Method)
> {noformat}
>  BDB thread dump attached.
> Altogether there were 16 threads handling BDB latch timeout (in stacktrace there was a call of com.sleepycat.je.latch.LatchSupport.handleTimeout() method).
> 2 of them – when sending messages: 
> {noformat}
> java.lang.Thread.dumpThreads(Native Method)
> java.lang.Thread.getAllStackTraces(Thread.java:1610)
> com.sleepycat.je.utilint.LoggerUtils.fullThreadDump(LoggerUtils.java:566)
> com.sleepycat.je.latch.LatchSupport.handleTimeout(LatchSupport.java:209)
> com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:63)
> com.sleepycat.je.latch.LatchImpl.acquireShared(LatchImpl.java:107)
> com.sleepycat.je.tree.IN.latchShared(IN.java:560)
> com.sleepycat.je.tree.Tree.latchChildShared(Tree.java:372)
> com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1817)
> com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1748)
> com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1666)
> com.sleepycat.je.tree.Tree.findBinForInsert(Tree.java:1637)
> com.sleepycat.je.dbi.CursorImpl.insertRecordInternal(CursorImpl.java:1337)
> com.sleepycat.je.dbi.CursorImpl.insertOrUpdateRecord(CursorImpl.java:1221)
> com.sleepycat.je.Cursor.putNoNotify(Cursor.java:2962)
> com.sleepycat.je.Cursor.putNotify(Cursor.java:2800)
> com.sleepycat.je.Cursor.putNoDups(Cursor.java:2647)
> com.sleepycat.je.Cursor.putInternal(Cursor.java:2478)
> com.sleepycat.je.Cursor.putInternal(Cursor.java:830)
> com.sleepycat.je.Database.put(Database.java:1574)
> com.sleepycat.je.Database.put(Database.java:1627)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.storeMetaData(AbstractBDBMessageStore.java:591)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.access$600(AbstractBDBMessageStore.java:77)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$StoredBDBMessage.store(AbstractBDBMessageStore.java:1165)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction$1.run(AbstractBDBMessageStore.java:1344)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction.doPreCommitActions(AbstractBDBMessageStore.java:1378)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction.commitTranAsync(AbstractBDBMessageStore.java:1400)
> org.apache.qpid.server.txn.AsyncAutoCommitTransaction.enqueue(AsyncAutoCommitTransaction.java:294)
> org.apache.qpid.server.message.RoutingResult.send(RoutingResult.java:124)
> org.apache.qpid.server.protocol.v1_0.NodeReceivingDestination.send(NodeReceivingDestination.java:117)
> org.apache.qpid.server.protocol.v1_0.StandardReceivingLinkEndpoint.receiveDelivery(StandardReceivingLinkEndpoint.java:280)
> org.apache.qpid.server.protocol.v1_0.AbstractReceivingLinkEndpoint.receiveTransfer(AbstractReceivingLinkEndpoint.java:165)
> org.apache.qpid.server.protocol.v1_0.Session_1_0.receiveTransfer(Session_1_0.java:637)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receiveTransfer(AMQPConnection_1_0Impl.java:792)
> org.apache.qpid.server.protocol.v1_0.type.transport.Transfer.invoke(Transfer.java:295)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.received(AMQPConnection_1_0Impl.java:515)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.lambda$receive$0(AMQPConnection_1_0Impl.java:467)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl$$Lambda$146/229440539.run(Unknown Source)
> java.security.AccessController.doPrivileged(Native Method)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receive(AMQPConnection_1_0Impl.java:461)
> org.apache.qpid.server.protocol.v1_0.framing.FrameHandler.parse(FrameHandler.java:224)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.onReceive(AMQPConnection_1_0Impl.java:1312)
> org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:576)
> org.apache.qpid.server.transport.AbstractAMQPConnection$$Lambda$74/2141950108.run(Unknown Source)
> java.security.AccessController.doPrivileged(Native Method)
> org.apache.qpid.server.transport.AbstractAMQPConnection.received(AbstractAMQPConnection.java:571)
> org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:135)
> org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:611)
> org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58)
> org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496)
> org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270)
> org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134)
> org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575)
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366)
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97)
> org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464)
> org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$10/1136768342.run(Unknown Source)
> java.lang.Thread.run(Thread.java:748)
> {noformat}
> 14 of them – when receiving messages:
> {noformat}
> com.sleepycat.je.utilint.LoggerUtils.fullThreadDump(LoggerUtils.java:557)
> com.sleepycat.je.latch.LatchSupport.handleTimeout(LatchSupport.java:209)
> com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:63)
> com.sleepycat.je.latch.LatchImpl.acquireShared(LatchImpl.java:107)
> com.sleepycat.je.tree.IN.latchShared(IN.java:560)
> com.sleepycat.je.tree.Tree.latchChildShared(Tree.java:372)
> com.sleepycat.je.tree.Tree.search(Tree.java:2275)
> com.sleepycat.je.tree.Tree.search(Tree.java:2172)
> com.sleepycat.je.dbi.CursorImpl.searchExact(CursorImpl.java:1951)
> com.sleepycat.je.Cursor.searchExact(Cursor.java:4195)
> com.sleepycat.je.Cursor.searchNoDups(Cursor.java:4056)
> com.sleepycat.je.Cursor.search(Cursor.java:3858)
> com.sleepycat.je.Cursor.getInternal(Cursor.java:1282)
> com.sleepycat.je.Database.get(Database.java:1342)
> com.sleepycat.je.Database.get(Database.java:1401)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.getMessageMetaData(AbstractBDBMessageStore.java:299)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$StoredBDBMessage.getMetaData(AbstractBDBMessageStore.java:1077)
> org.apache.qpid.server.message.AbstractServerMessageImpl.isPersistent(AbstractServerMessageImpl.java:178)
> org.apache.qpid.server.queue.QueueEntryImpl.populateInstanceProperties(QueueEntryImpl.java:155)
> org.apache.qpid.server.queue.QueueEntryImpl.<init>(QueueEntryImpl.java:147)
> org.apache.qpid.server.queue.OrderedQueueEntry.<init>(OrderedQueueEntry.java:46)
> org.apache.qpid.server.queue.StandardQueueEntry.<init>(StandardQueueEntry.java:37)
> org.apache.qpid.server.queue.StandardQueueEntryList.createQueueEntry(StandardQueueEntryList.java:48)
> org.apache.qpid.server.queue.StandardQueueEntryList.createQueueEntry(StandardQueueEntryList.java:26)
> org.apache.qpid.server.queue.OrderedQueueEntryList.add(OrderedQueueEntryList.java:92)
> org.apache.qpid.server.queue.AbstractQueue.doEnqueue(AbstractQueue.java:1356)
> org.apache.qpid.server.queue.AbstractQueue.enqueue(AbstractQueue.java:1289)
> org.apache.qpid.server.message.RoutingResult$1.postCommit(RoutingResult.java:135)
> org.apache.qpid.server.txn.AsyncAutoCommitTransaction$3.postCommit(AsyncAutoCommitTransaction.java:309)
> org.apache.qpid.server.txn.AsyncCommand.complete(AsyncCommand.java:84)
> org.apache.qpid.server.protocol.v1_0.StandardReceivingLinkEndpoint.receiveComplete(StandardReceivingLinkEndpoint.java:588)
> org.apache.qpid.server.protocol.v1_0.Session_1_0.lambda$receivedComplete$5(Session_1_0.java:1358)
> org.apache.qpid.server.protocol.v1_0.Session_1_0$$Lambda$147/1374598514.accept(Unknown Source)
> java.lang.Iterable.forEach(Iterable.java:75)
> org.apache.qpid.server.protocol.v1_0.Session_1_0.receivedComplete(Session_1_0.java:1358)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.lambda$receivedComplete$11(AMQPConnection_1_0Impl.java:1347)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl$$Lambda$152/950450513.run(Unknown Source)
> java.security.AccessController.doPrivileged(Native Method)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receivedComplete(AMQPConnection_1_0Impl.java:1345)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.onReceive(AMQPConnection_1_0Impl.java:1326)
> org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:576)
> org.apache.qpid.server.transport.AbstractAMQPConnection$$Lambda$74/2141950108.run(Unknown Source)
> java.security.AccessController.doPrivileged(Native Method)
> org.apache.qpid.server.transport.AbstractAMQPConnection.received(AbstractAMQPConnection.java:571)
> org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:135)
> org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:611)
> org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58)
> org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496)
> org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270)
> org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134)
> org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575)
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366)
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97)
> org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464)
> org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$10/1136768342.run(Unknown Source)
> {noformat}
> Before the thread dump in log were present massive “QUE-1005 : Dropped” messages (ca. 73.000) and many error messages “Connection closed with error : amqp:resource-limit-exceeded - local-idle-timeout expired” (191).
> I’m not sure, that AbstractServerMessageImpl.isPersistent() call can cause a deadlock, because get() operation is invoked with LockMode.READ_UNCOMMITTED parameter. According to BDB documentation READ_UNCOMMITTED mode should be non-blocking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org