You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2023/01/13 14:24:00 UTC

[jira] [Commented] (ARTEMIS-4114) Broker deadlock occurs when restarting another broker in the cluster

    [ https://issues.apache.org/jira/browse/ARTEMIS-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676642#comment-17676642 ] 

ASF subversion and git services commented on ARTEMIS-4114:
----------------------------------------------------------

Commit b565a8a7b9f7415aa158e7cdc47a665187e8dd79 in activemq-artemis's branch refs/heads/main from Clebert Suconic
[ https://gitbox.apache.org/repos/asf?p=activemq-artemis.git;h=b565a8a7b9 ]

ARTEMIS-4114 Avoiding deadlock during scale down

We will rely on existing tests for this change


> Broker deadlock occurs when restarting another broker in the cluster
> --------------------------------------------------------------------
>
>                 Key: ARTEMIS-4114
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-4114
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.19.1
>            Reporter: Alexander
>            Priority: Critical
>         Attachments: fallen_broker_logs.txt, restarted_broker_logs.txt
>
>
> Broker deadlock occurs when restarting another broker in the cluster.
> When one of the cluster brokers is restarted (cluster of 4 brokers) we get a restart of another broker.
> Brokers are connected via staticConnectors, scaleDown policy is also configured:
> {code:xml}
>     <ha-policy>
>        <live-only>
>           <scale-down>
>              <connectors>
>                 <connector-ref>ART.EL.CLS1-connector</connector-ref>
>                 <connector-ref>ART.EL.CLS2-connector</connector-ref>
>                 <connector-ref>ART.EL.CLS3-connector</connector-ref>
>              </connectors>
>           </scale-down>
>       </live-only>
>     </ha-policy>{code}
> Logs of fallen broker: 
> {noformat}
> Deadlock detected!
> "Thread-16 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@46cc127b)" Id=82 BLOCKED on org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionBridge@62661d03 owned by "Thread-142 (ActiveMQ-client-global-threads)" Id=10066
>     at org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.handle(BridgeImpl.java:620)
>     -  blocked on org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionBridge@62661d03
>     at org.apache.activemq.artemis.core.server.impl.QueueImpl.handle(QueueImpl.java:3897)
>     -  locked org.apache.activemq.artemis.core.server.impl.QueueImpl@59041573
>     at org.apache.activemq.artemis.core.server.impl.QueueImpl.deliver(QueueImpl.java:3061)
>     -  locked org.apache.activemq.artemis.core.server.impl.QueueImpl@59041573
>     at org.apache.activemq.artemis.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:4205)
>     at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
>     at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
>     at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
>     at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$134/0x00000008002b5840.run(Unknown Source)
>     at java.base@11.0.9/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at java.base@11.0.9/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)    
> Number of locked synchronizers = 2
>     - java.util.concurrent.ThreadPoolExecutor$Worker@ffceecd
>     - java.util.concurrent.locks.ReentrantLock$NonfairSync@561fd6c1
> "Thread-142 (ActiveMQ-client-global-threads)" Id=10066 BLOCKED on org.apache.activemq.artemis.core.server.impl.QueueImpl@59041573 owned by "Thread-16 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@46cc127b)" Id=82
>     at org.apache.activemq.artemis.core.server.impl.QueueImpl.iterQueue(QueueImpl.java:2158)
>     -  blocked on org.apache.activemq.artemis.core.server.impl.QueueImpl@59041573
>     at org.apache.activemq.artemis.core.server.impl.QueueImpl.moveReferencesBetweenSnFQueues(QueueImpl.java:2649)
>     at org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.scaleDown(BridgeImpl.java:746)
>     -  locked org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionBridge@62661d03
>     at org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.connectionFailed(BridgeImpl.java:728)
>     at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.callSessionFailureListeners(ClientSessionFactoryImpl.java:774)
>     at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.failoverOrReconnect(ClientSessionFactoryImpl.java:709)
>     at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:544)
>     at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.access$600(ClientSessionFactoryImpl.java:75)
>     at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$DelegatingFailureListener.connectionFailed(ClientSessionFactoryImpl.java:1317)
>     at org.apache.activemq.artemis.spi.core.protocol.AbstractRemotingConnection.callFailureListeners(AbstractRemotingConnection.java:78)
>     at org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:222)
>     at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$CloseRunnable.run(ClientSessionFactoryImpl.java:1091)
>     at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
>     at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
>     at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
>     at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$134/0x00000008002b5840.run(Unknown Source)
>     at java.base@11.0.9/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at java.base@11.0.9/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)    
> Number of locked synchronizers = 3
>     - java.util.concurrent.ThreadPoolExecutor$Worker@21768e7
>     - java.util.concurrent.locks.ReentrantLock$NonfairSync@32848485
>     - java.util.concurrent.locks.ReentrantLock$NonfairSync@6efeeadb{noformat}
> In attachments, added logs of a restarting broker and logs of a falling broker.
> The broker fell two minutes after the restart.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)