You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "Justin Bertram (Jira)" <ji...@apache.org> on 2022/05/11 19:07:00 UTC

[jira] [Commented] (ARTEMIS-3622) MQTT can deadlock on client connection / disconnection

    [ https://issues.apache.org/jira/browse/ARTEMIS-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535071#comment-17535071 ] 

Justin Bertram commented on ARTEMIS-3622:
-----------------------------------------

Are you able to reproduce this with the [latest snapshot build|https://repository.apache.org/content/repositories/snapshots/org/apache/activemq/apache-artemis/2.23.0-SNAPSHOT/]? There have been a lot of changes in the MQTT implementation since 2.19.0 (i.e. MQTT 5 was implemented and a lot of code was refactored along the way) and it would be great to get a thread dump for this dead-lock with the latest code.

As far as the solution goes I'm all for refactoring all the locking around the MQTT session state so the "most correct" option is preferred even if it means changing a bunch of classes. The current strategy obviously has gaps and is likely more complicated than it needs to be.

> MQTT can deadlock on client connection / disconnection
> ------------------------------------------------------
>
>                 Key: ARTEMIS-3622
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-3622
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: MQTT
>    Affects Versions: 2.19.0
>         Environment: Using the latest java 17 and artemis 2.19 but looking at the code, it should affect 2.20 as well.
>            Reporter: Marcelo Takeshi Fukushima
>            Priority: Major
>
> It seems that the {{MQTTProtocolHandler}} and {{MQTTConnectionManager}} are on a racing condition and can deadlock themselves on misbehaving clients. I'm including the relevant stack trace (ignore thread 11 that is just waiting for the lock).
> Looking at the relevant code, it seems that the clean-up thread (88 on the {{{}MQTTFailureListener{}}}) starts cleaning up the session state and then the session, but when {{MQTTSession.stop}} calls {{{}MQTTSessionState.clear{}}}, the session state is no longer the same (a racy connection has replaced the session state with a brand new under the same client-id).
> I think the methods connect and disconnect on the {{MQTTConnectionManager}} could be marked as synchronized as a whole, to prevent racy connects / disconnects (but since I don't know all the ins and outs of the code, you guys might have a better fix).
> {noformat}
> Found one Java-level deadlock:
> =============================
> "Thread-11 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@640f11a1)":
>   waiting to lock monitor 0x00007f6d003368c0 (object 0x000000045f29f240, a org.apache.activemq.artemis.core.protocol.mqtt.MQTTSessionState),
>   which is held by "Thread-24 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@640f11a1)"
> "Thread-24 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@640f11a1)":
>   waiting to lock monitor 0x00007f6d00336a80 (object 0x000000045f2a1068, a org.apache.activemq.artemis.core.protocol.mqtt.MQTTSession),
>   which is held by "Thread-88 (ActiveMQ-remoting-threads-ActiveMQServerImpl::name=0.0.0.0-212232499)"
> "Thread-88 (ActiveMQ-remoting-threads-ActiveMQServerImpl::name=0.0.0.0-212232499)":
>   waiting to lock monitor 0x00007f6d003368c0 (object 0x000000045f29f240, a org.apache.activemq.artemis.core.protocol.mqtt.MQTTSessionState),
>   which is held by "Thread-24 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@640f11a1)"
> Java stack information for the threads listed above:
> ===================================================
> "Thread-11 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@640f11a1)":
>   at org.apache.activemq.artemis.core.protocol.mqtt.MQTTConnectionManager.disconnect(MQTTConnectionManager.java:150)
>   - waiting to lock <0x000000045f29f240> (a org.apache.activemq.artemis.core.protocol.mqtt.MQTTSessionState)
>   at org.apache.activemq.artemis.core.protocol.mqtt.MQTTFailureListener.connectionFailed(MQTTFailureListener.java:37)
>   at org.apache.activemq.artemis.core.protocol.mqtt.MQTTConnection.fail(MQTTConnection.java:150)
>   at org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl$FailureCheckAndFlushThread$2.run(RemotingServiceImpl.java:780)
>   at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
>   at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
>   at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
>   at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$137/0x0000000800e01dc8.run(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.1/ThreadPoolExecutor.java:1136)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.1/ThreadPoolExecutor.java:635)
>   at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
> "Thread-24 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@640f11a1)":
>   at org.apache.activemq.artemis.core.protocol.mqtt.MQTTSession.start(MQTTSession.java:87)
>   - waiting to lock <0x000000045f2a1068> (a org.apache.activemq.artemis.core.protocol.mqtt.MQTTSession)
>   at org.apache.activemq.artemis.core.protocol.mqtt.MQTTConnectionManager.connect(MQTTConnectionManager.java:111)
>   - locked <0x000000045f29f240> (a org.apache.activemq.artemis.core.protocol.mqtt.MQTTSessionState)
>   at org.apache.activemq.artemis.core.protocol.mqtt.MQTTProtocolHandler.handleConnect(MQTTProtocolHandler.java:185)
>   at org.apache.activemq.artemis.core.protocol.mqtt.MQTTProtocolHandler.act(MQTTProtocolHandler.java:133)
>   at org.apache.activemq.artemis.core.protocol.mqtt.MQTTProtocolHandler$$Lambda$382/0x0000000801034000.onMessage(Unknown Source)
>   at org.apache.activemq.artemis.utils.actors.Actor.doTask(Actor.java:33)
>   at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
>   at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$137/0x0000000800e01dc8.run(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.1/ThreadPoolExecutor.java:1136)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.1/ThreadPoolExecutor.java:635)
>   at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
> "Thread-88 (ActiveMQ-remoting-threads-ActiveMQServerImpl::name=0.0.0.0-212232499)":
>   at org.apache.activemq.artemis.core.protocol.mqtt.MQTTSessionState.clear(MQTTSessionState.java:59)
>   - waiting to lock <0x000000045f29f240> (a org.apache.activemq.artemis.core.protocol.mqtt.MQTTSessionState)
>   at org.apache.activemq.artemis.core.protocol.mqtt.MQTTSession.clean(MQTTSession.java:200)
>   at org.apache.activemq.artemis.core.protocol.mqtt.MQTTSession.stop(MQTTSession.java:115)
>   - locked <0x000000045f2a1068> (a org.apache.activemq.artemis.core.protocol.mqtt.MQTTSession)
>   at org.apache.activemq.artemis.core.protocol.mqtt.MQTTConnectionManager.disconnect(MQTTConnectionManager.java:155)
>   - locked <0x00000002223bc2d0> (a org.apache.activemq.artemis.core.protocol.mqtt.MQTTSessionState)
>   at org.apache.activemq.artemis.core.protocol.mqtt.MQTTFailureListener.connectionFailed(MQTTFailureListener.java:37)
>   at org.apache.activemq.artemis.core.protocol.mqtt.MQTTConnection.fail(MQTTConnection.java:150)
>   at org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl.issueFailure(RemotingServiceImpl.java:606)
>   at org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl.connectionDestroyed(RemotingServiceImpl.java:587)
>   at org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor$Listener.connectionDestroyed(NettyAcceptor.java:951)
>   at org.apache.activemq.artemis.core.remoting.impl.netty.ActiveMQChannelHandler.lambda$channelInactive$0(ActiveMQChannelHandler.java:89)
>   at org.apache.activemq.artemis.core.remoting.impl.netty.ActiveMQChannelHandler$$Lambda$419/0x0000000801103b48.run(Unknown Source)
>   at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
>   at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
>   at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
>   at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$137/0x0000000800e01dc8.run(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.1/ThreadPoolExecutor.java:1136)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.1/ThreadPoolExecutor.java:635)
>   at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
> Found 1 deadlock.{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)