You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "Clebert Suconic (Jira)" <ji...@apache.org> on 2021/02/09 21:25:07 UTC

[jira] [Closed] (ARTEMIS-3037) JournalImpl#checkKnownRecordID() implementation can leave a thread hanging in WAITING state

     [ https://issues.apache.org/jira/browse/ARTEMIS-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Clebert Suconic closed ARTEMIS-3037.
------------------------------------

> JournalImpl#checkKnownRecordID() implementation can leave a thread hanging in WAITING state
> -------------------------------------------------------------------------------------------
>
>                 Key: ARTEMIS-3037
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-3037
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.9.0, 2.16.0
>            Reporter: Tomas Hofman
>            Priority: Major
>             Fix For: 2.17.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> The {{JournalImpl#checkKnownRecordID()}} implementation contains following code:
> {code}
>       final SimpleFuture<Boolean> known = new SimpleFutureImpl<>();
>       // retry on the append thread. maybe the appender thread is not keeping up.
>       appendExecutor.execute(new Runnable() {
>          @Override
>          public void run() {
>             journalLock.readLock().lock();
>             try {
>                known.set(records.containsKey(id)
>                   || pendingRecords.contains(id)
>                   || (compactor != null && compactor.containsRecord(id)));
>             } finally {
>                journalLock.readLock().unlock();
>             }
>          }
>       });
>       if (!known.get()) {
>           ...
>       }
> {code}
> If the code in the Runnable fails with exception before the {{known}} future value is set, the main thread would be left in the WAITING state forever. Exception handling should be added that would cancel the future in case of exception.
> We've observed cases where following threads were left hanging, while no other threads operating inside JournalImpl were present. I believe that {{JournalImpl#checkKnownRecordID()}} implementation may be responsible for that:
> {code}
> "Thread-16 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@423fe5c3)" #1078 prio=5 os_prio=64 tid=0x000000011c34a000 nid=0x4eb waiting on condition [0xfffffffabe9ad000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0xfffffffbe73c29e8> (a java.util.concurrent.CountDownLatch$Sync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>         at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>         at org.apache.activemq.artemis.utils.SimpleFutureImpl.get(SimpleFutureImpl.java:62)
>         at org.apache.activemq.artemis.core.journal.impl.JournalImpl.checkKnownRecordID(JournalImpl.java:1080)
>         at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:950)
>         at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.confirmPendingLargeMessage(AbstractJournalStorageManager.java:361)
>         at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.confirmLargeMessageSend(PostOfficeImpl.java:1390)
>         - locked <0xfffffffbe73aa1b0> (a org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl)
>         at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.processRoute(PostOfficeImpl.java:1336)
>         at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.route(PostOfficeImpl.java:980)
>         at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.route(PostOfficeImpl.java:871)
>         at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.doSend(ServerSessionImpl.java:2045)
>         - locked <0xfffffffb19447fb8> (a org.apache.activemq.artemis.core.server.impl.ServerSessionImpl)
>         at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.doSend(ServerSessionImpl.java:1989)
>         - locked <0xfffffffb19447fb8> (a org.apache.activemq.artemis.core.server.impl.ServerSessionImpl)
>         at org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.sendContinuations(ServerSessionPacketHandler.java:1034)
>         - locked <0xfffffffb1962b900> (a java.lang.Object)
>         at org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.slowPacketHandler(ServerSessionPacketHandler.java:312)
>         at org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.onMessagePacket(ServerSessionPacketHandler.java:285)
>         at org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler$$Lambda$651/2097400985.onMessage(Unknown Source)
>         at org.apache.activemq.artemis.utils.actors.Actor.doTask(Actor.java:33)
>         at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66)
>         at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$413/494003142.run(Unknown Source)
>         at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
>         at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
>         at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66)
>         at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$413/494003142.run(Unknown Source)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java)
>         at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
>    Locked ownable synchronizers:
>         - <0xfffffffba1800ca0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> {code}
> {code}
> "Thread-82 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$7@3bde9e44)" #2130 prio=5 os_prio=64 tid=0x000000017b6df800 nid=0x907 waiting for monitor entry [0xffffffff045de000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl.getEncodeSize(LargeServerMessageImpl.java:178)
>         - waiting to lock <0xfffffffbe73aa1b0> (a org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl)
>         at org.apache.activemq.artemis.core.persistence.impl.journal.codec.LargeMessagePersister.getEncodeSize(LargeMessagePersister.java:59)
>         at org.apache.activemq.artemis.core.persistence.impl.journal.codec.LargeMessagePersister.getEncodeSize(LargeMessagePersister.java:25)
>         at org.apache.activemq.artemis.core.journal.impl.dataformat.JournalAddRecord.getEncodeSize(JournalAddRecord.java:79)
>         at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendRecord(JournalImpl.java:2792)
>         at org.apache.activemq.artemis.core.journal.impl.JournalImpl.access$100(JournalImpl.java:91)
>         at org.apache.activemq.artemis.core.journal.impl.JournalImpl$1.run(JournalImpl.java:850)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)