You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Daniil Kirilyuk (Jira)" <ji...@apache.org> on 2020/09/25 06:45:00 UTC

[jira] [Created] (QPID-8470) Deadlock / latch timeout in BDB database

Daniil Kirilyuk created QPID-8470:
-------------------------------------

             Summary: Deadlock  / latch timeout in BDB database
                 Key: QPID-8470
                 URL: https://issues.apache.org/jira/browse/QPID-8470
             Project: Qpid
          Issue Type: Bug
          Components: Broker-J
    Affects Versions: qpid-java-broker-8.0.1
            Reporter: Daniil Kirilyuk
         Attachments: JE-thread-dump.txt

There was deadlock (or [latch timeout|[https://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/EnvironmentConfig.html#ENV_LATCH_TIMEOUT]) in BDB database or latch was hold for more then timeout. Then was database environment closed, database transactions cancelled and broker initiated shutdown.
{noformat}
2020-08-19 00:42:41,955 ERROR [IO-/127.0.0.1:55550] (c.s.j.d.EnvironmentImpl) - default Thread dump follows for latch timeout: BIN2605191 currentThread: Thread[IO-/127.0.0.1:55550,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55522,5,main]
2020-08-19 00:42:41,955 ERROR [IO-/127.0.0.1:55580] (c.s.j.d.EnvironmentImpl) - default Thread dump follows for latch timeout: BIN2605060 currentThread: Thread[IO-/127.0.0.1:55580,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55584,5,main]
2020-08-19 00:42:41,955 ERROR [IO-/127.0.0.1:55544] (c.s.j.d.EnvironmentImpl) - default Thread dump follows for latch timeout: BIN2605128 currentThread: Thread[IO-/127.0.0.1:55544,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55548,5,main]
....
2020-08-19 00:42:42,174 ERROR [IO-/127.0.0.1:55538] (o.a.q.s.u.ServerScopedRuntimeException) - Error reading message metadata for message with id 52671357: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages Latch timeout. BIN2605191 currentThread: Thread[IO-/127.0.0.1:55538,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55522,5,main] UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. Environment is invalid and must be closed.
org.apache.qpid.server.logging.logback.Logback1027WorkaroundTurboFilter$StringifiedException: com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages Latch timeout. BIN2605191 currentThread: Thread[IO-/127.0.0.1:55538,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55522,5,main] UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. Environment is invalid and must be closed.
        at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:443)
        at com.sleepycat.je.latch.LatchSupport.handleTimeout(LatchSupport.java:211)
....
                at org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496)
                at org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270)
                at org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134)
                at org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575)
                at org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366)
                at org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97)
                ... 5 more
                Suppressed: com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages Latch timeout. BIN2605176 currentThread: Thread[IO-/127.0.0.1:55496,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55494,5,main] UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. Environment is invalid and must be closed.
                        at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:228)
                        at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1766)
                        at com.sleepycat.je.dbi.CursorImpl.checkEnv(CursorImpl.java:3869)
                        at com.sleepycat.je.Cursor.checkEnv(Cursor.java:5394)
                        at com.sleepycat.je.Cursor.close(Cursor.java:540)
                        at com.sleepycat.je.Database.get(Database.java:1344)
                        ... 41 more
        [CIRCULAR REFERENCE:com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages Latch timeout. BIN2605176 currentThread: Thread[IO-/127.0.0.1:55496,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55494,5,main] UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. Environment is invalid and must be closed.]
End of stringified Stacktrace
...
2020-08-19 00:42:42,201 ERROR [IO-/127.0.0.1:55498] (o.a.q.s.Main) - Uncaught exception, shutting down.
org.apache.qpid.server.logging.logback.Logback1027WorkaroundTurboFilter$StringifiedException: org.apache.qpid.server.util.ServerScopedRuntimeException: org.apache.qpid.server.store.StoreException: Error reading message metadata for message with id 52671021: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages Latch timeout. BIN2605177 currentThread: Thread[IO-/127.0.0.1:55498,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55554,5,main] UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. Environment is invalid and must be closed.
        at org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:582)
        at java.security.AccessController.doPrivileged(Native Method)
{noformat}
 BDB thread dump attached.

Altogether there were 16 threads handling BDB latch timeout (in stacktrace there was a call of com.sleepycat.je.latch.LatchSupport.handleTimeout() method).

2 of them – when sending messages: 
{noformat}
java.lang.Thread.dumpThreads(Native Method)
java.lang.Thread.getAllStackTraces(Thread.java:1610)
com.sleepycat.je.utilint.LoggerUtils.fullThreadDump(LoggerUtils.java:566)
com.sleepycat.je.latch.LatchSupport.handleTimeout(LatchSupport.java:209)
com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:63)
com.sleepycat.je.latch.LatchImpl.acquireShared(LatchImpl.java:107)
com.sleepycat.je.tree.IN.latchShared(IN.java:560)
com.sleepycat.je.tree.Tree.latchChildShared(Tree.java:372)
com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1817)
com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1748)
com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1666)
com.sleepycat.je.tree.Tree.findBinForInsert(Tree.java:1637)
com.sleepycat.je.dbi.CursorImpl.insertRecordInternal(CursorImpl.java:1337)
com.sleepycat.je.dbi.CursorImpl.insertOrUpdateRecord(CursorImpl.java:1221)
com.sleepycat.je.Cursor.putNoNotify(Cursor.java:2962)
com.sleepycat.je.Cursor.putNotify(Cursor.java:2800)
com.sleepycat.je.Cursor.putNoDups(Cursor.java:2647)
com.sleepycat.je.Cursor.putInternal(Cursor.java:2478)
com.sleepycat.je.Cursor.putInternal(Cursor.java:830)
com.sleepycat.je.Database.put(Database.java:1574)
com.sleepycat.je.Database.put(Database.java:1627)
org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.storeMetaData(AbstractBDBMessageStore.java:591)
org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.access$600(AbstractBDBMessageStore.java:77)
org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$StoredBDBMessage.store(AbstractBDBMessageStore.java:1165)
org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction$1.run(AbstractBDBMessageStore.java:1344)
org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction.doPreCommitActions(AbstractBDBMessageStore.java:1378)
org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction.commitTranAsync(AbstractBDBMessageStore.java:1400)
org.apache.qpid.server.txn.AsyncAutoCommitTransaction.enqueue(AsyncAutoCommitTransaction.java:294)
org.apache.qpid.server.message.RoutingResult.send(RoutingResult.java:124)
org.apache.qpid.server.protocol.v1_0.NodeReceivingDestination.send(NodeReceivingDestination.java:117)
org.apache.qpid.server.protocol.v1_0.StandardReceivingLinkEndpoint.receiveDelivery(StandardReceivingLinkEndpoint.java:280)
org.apache.qpid.server.protocol.v1_0.AbstractReceivingLinkEndpoint.receiveTransfer(AbstractReceivingLinkEndpoint.java:165)
org.apache.qpid.server.protocol.v1_0.Session_1_0.receiveTransfer(Session_1_0.java:637)
org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receiveTransfer(AMQPConnection_1_0Impl.java:792)
org.apache.qpid.server.protocol.v1_0.type.transport.Transfer.invoke(Transfer.java:295)
org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.received(AMQPConnection_1_0Impl.java:515)
org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.lambda$receive$0(AMQPConnection_1_0Impl.java:467)
org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl$$Lambda$146/229440539.run(Unknown Source)
java.security.AccessController.doPrivileged(Native Method)
org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receive(AMQPConnection_1_0Impl.java:461)
org.apache.qpid.server.protocol.v1_0.framing.FrameHandler.parse(FrameHandler.java:224)
org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.onReceive(AMQPConnection_1_0Impl.java:1312)
org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:576)
org.apache.qpid.server.transport.AbstractAMQPConnection$$Lambda$74/2141950108.run(Unknown Source)
java.security.AccessController.doPrivileged(Native Method)
org.apache.qpid.server.transport.AbstractAMQPConnection.received(AbstractAMQPConnection.java:571)
org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:135)
org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:611)
org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58)
org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496)
org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270)
org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134)
org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575)
org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366)
org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97)
org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464)
org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$10/1136768342.run(Unknown Source)
java.lang.Thread.run(Thread.java:748)

{noformat}
14 of them – when receiving messages:
{noformat}
com.sleepycat.je.utilint.LoggerUtils.fullThreadDump(LoggerUtils.java:557)
com.sleepycat.je.latch.LatchSupport.handleTimeout(LatchSupport.java:209)
com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:63)
com.sleepycat.je.latch.LatchImpl.acquireShared(LatchImpl.java:107)
com.sleepycat.je.tree.IN.latchShared(IN.java:560)
com.sleepycat.je.tree.Tree.latchChildShared(Tree.java:372)
com.sleepycat.je.tree.Tree.search(Tree.java:2275)
com.sleepycat.je.tree.Tree.search(Tree.java:2172)
com.sleepycat.je.dbi.CursorImpl.searchExact(CursorImpl.java:1951)
com.sleepycat.je.Cursor.searchExact(Cursor.java:4195)
com.sleepycat.je.Cursor.searchNoDups(Cursor.java:4056)
com.sleepycat.je.Cursor.search(Cursor.java:3858)
com.sleepycat.je.Cursor.getInternal(Cursor.java:1282)
com.sleepycat.je.Database.get(Database.java:1342)
com.sleepycat.je.Database.get(Database.java:1401)
org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.getMessageMetaData(AbstractBDBMessageStore.java:299)
org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$StoredBDBMessage.getMetaData(AbstractBDBMessageStore.java:1077)
org.apache.qpid.server.message.AbstractServerMessageImpl.isPersistent(AbstractServerMessageImpl.java:178)
org.apache.qpid.server.queue.QueueEntryImpl.populateInstanceProperties(QueueEntryImpl.java:155)
org.apache.qpid.server.queue.QueueEntryImpl.<init>(QueueEntryImpl.java:147)
org.apache.qpid.server.queue.OrderedQueueEntry.<init>(OrderedQueueEntry.java:46)
org.apache.qpid.server.queue.StandardQueueEntry.<init>(StandardQueueEntry.java:37)
org.apache.qpid.server.queue.StandardQueueEntryList.createQueueEntry(StandardQueueEntryList.java:48)
org.apache.qpid.server.queue.StandardQueueEntryList.createQueueEntry(StandardQueueEntryList.java:26)
org.apache.qpid.server.queue.OrderedQueueEntryList.add(OrderedQueueEntryList.java:92)
org.apache.qpid.server.queue.AbstractQueue.doEnqueue(AbstractQueue.java:1356)
org.apache.qpid.server.queue.AbstractQueue.enqueue(AbstractQueue.java:1289)
org.apache.qpid.server.message.RoutingResult$1.postCommit(RoutingResult.java:135)
org.apache.qpid.server.txn.AsyncAutoCommitTransaction$3.postCommit(AsyncAutoCommitTransaction.java:309)
org.apache.qpid.server.txn.AsyncCommand.complete(AsyncCommand.java:84)
org.apache.qpid.server.protocol.v1_0.StandardReceivingLinkEndpoint.receiveComplete(StandardReceivingLinkEndpoint.java:588)
org.apache.qpid.server.protocol.v1_0.Session_1_0.lambda$receivedComplete$5(Session_1_0.java:1358)
org.apache.qpid.server.protocol.v1_0.Session_1_0$$Lambda$147/1374598514.accept(Unknown Source)
java.lang.Iterable.forEach(Iterable.java:75)
org.apache.qpid.server.protocol.v1_0.Session_1_0.receivedComplete(Session_1_0.java:1358)
org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.lambda$receivedComplete$11(AMQPConnection_1_0Impl.java:1347)
org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl$$Lambda$152/950450513.run(Unknown Source)
java.security.AccessController.doPrivileged(Native Method)
org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receivedComplete(AMQPConnection_1_0Impl.java:1345)
org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.onReceive(AMQPConnection_1_0Impl.java:1326)
org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:576)
org.apache.qpid.server.transport.AbstractAMQPConnection$$Lambda$74/2141950108.run(Unknown Source)
java.security.AccessController.doPrivileged(Native Method)
org.apache.qpid.server.transport.AbstractAMQPConnection.received(AbstractAMQPConnection.java:571)
org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:135)
org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:611)
org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58)
org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496)
org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270)
org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134)
org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575)
org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366)
org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97)
org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464)
org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$10/1136768342.run(Unknown Source)
{noformat}
Before the thread dump in log were present massive “QUE-1005 : Dropped” messages (ca. 73.000) and many error messages “Connection closed with error : amqp:resource-limit-exceeded - local-idle-timeout expired” (191).

I’m not sure, that AbstractServerMessageImpl.isPersistent() call can cause a deadlock, because get() operation is invoked with LockMode.READ_UNCOMMITTED parameter. According to BDB documentation READ_UNCOMMITTED mode should be non-blocking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org