You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "Clebert Suconic (Jira)" <ji...@apache.org> on 2021/08/09 17:58:06 UTC

[jira] [Closed] (ARTEMIS-3174) ServerSessionImpl doesn't set new connection when transferring session

     [ https://issues.apache.org/jira/browse/ARTEMIS-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Clebert Suconic closed ARTEMIS-3174.
------------------------------------

> ServerSessionImpl doesn't set new connection when transferring session
> ----------------------------------------------------------------------
>
>                 Key: ARTEMIS-3174
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-3174
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.16.0, 2.17.0
>         Environment: We are using CORE connections between the broker and our clients.
> Our clients are using the JMS abstraction layer.
> The clients are using the following options:
>  * confirmationWindowSize of 1024*1024 (=1MiB)
>  * connectionTTL of 60 seconds
>  * reconnectAttempts and initialConnectAttempts set to -1 (infinite retry)
>  
> We found the issue with Artemis version 2.16.0, but I checked latest master and the bug still exists there.
>            Reporter: Markus Meierhofer
>            Priority: Major
>             Fix For: 2.18.0
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> h3. How the issue was discovered
> Because of a different bug in the broker where sometimes, closing a connection didn't clean up the connected sessions (https://issues.apache.org/jira/browse/ARTEMIS-2870), we implemented a "monitor" process running alongside the broker which, via JMX remote management, checks the existing connection IDs and compares them with the connection IDs it retrieved from existing consumers/producers. When a connectionID of a consumer/producer isn't contained in the list of active connection IDs, the monitor process closed the session attached to the consumer/producer.
> At a customer site, we had the issue that our monitor process seemingly closed sessions of alive connections and not closed ones. This was due to the fact that, after a connection failover by the client, the consumer/producer JSON object retrieved via JMX still reported to be using the old connection ID although the consumer/producer was already using the new connection (when checking the logs of the artemis client after failover).
> The scenario is described down below. It shows the logs of the artemis client on the one hand and of our "monitor" process connected to the broker via JMX on the other hand, which every 10 seconds prints all connection IDs and the consumer/producerInfo retrieved.
> At 10:36:45, the artemis client recognized a connection failure:
> {noformat}
> [WARN 2021-03-09 10:36:51,522 l-threads)  org.apache.activemq.artemis.core.client]: AMQ212037: Connection failure to fms/10.221.5.49:61616 has been detected: AMQ219014: Timed out after waiting 4,000 ms for response when sending packet 71 [code=CONNECTION_TIMEDOUT]{noformat}
> On the monitor process, you can see that there are 3 connection IDs (b20b08e0 is the old connection of the client which failed):
>  {noformat}
> [DEBUG 2021-03-09 10:36:45,858 cheduler-1 fms.jmx.ClearDeadBrokerSessionsScheduler]: Found connectionIDs: [b20b08e0, 8080f955, 5dc2dff9]
> Found consumers: ...,{"consumerID":0,"connectionID":"b20b08e0","sessionID":"237f52c1-8039-11eb-b5ae-2acf578dd699","queueName":"incubed.queue.robot_1000.robot_command","browseOnly":false,"creationTime":1615226846214,"deliveringCount":0},...
> {noformat}
> At 10:37:09 (25 seconds later), the client is able to successfully fail over. It seems that he is doing session reattachment instead of session recreate:
> {noformat}
> [ERROR 2021-03-09 10:37:09,307 ead-289224 nt.control.RestartAgentExceptionListener]: JMS exception listener retrieved exception!
> {noformat}
> The monitor process now shows 4 connections (4cc50064 is the new connection of the client). Also note that there are still consumers (and their sessions) which report the old connection ID. But these sessions are actually used by the client (he has successfully transferred the sessions). We conclude this from the fact that there are NO consumers/producers with the new connection ID, but the client is able to communicate fully with the broker after successful failover. Also, the session IDs do not change (which indicates a session transfer):
> {noformat}
> [DEBUG 2021-03-09 10:37:15,900 cheduler-9 fms.jmx.ClearDeadBrokerSessionsScheduler]: Found connectionIDs: [4cc50064, b20b08e0, 8080f955, 5dc2dff9]
>  [DEBUG 2021-03-09 10:37:15,901 cheduler-9 fms.jmx.ClearDeadBrokerSessionsScheduler]: Found consumers:...,{"consumerID":0,"connectionID":"b20b08e0","sessionID":"237f52c1-8039-11eb-b5ae-2acf578dd699","queueName":"incubed.queue.robot_1000.robot_command","browseOnly":false,"creationTime":1615226846214,"deliveringCount":0},...
> {noformat}
> ~60 seconds after the failure of the old connection, the broker closes the connection. Afterwards, the monitor process only shows 3 connection (b20b08e0 was closed). But you can see that there are still consumers reporting the old connection ID (these are consumers of sessions that have actually been transferred to the new connection):
> {noformat}
> [DEBUG 2021-03-09 10:37:55,973 heduler-10 fms.jmx.ClearDeadBrokerSessionsScheduler]: Found connectionIDs: [4cc50064, 8080f955, 5dc2dff9]
> [DEBUG 2021-03-09 10:37:55,973 heduler-10 fms.jmx.ClearDeadBrokerSessionsScheduler]: Found consumers: ...,{"consumerID":0,"connectionID":"b20b08e0","sessionID":"237f52c1-8039-11eb-b5ae-2acf578dd699","queueName":"incubed.queue.robot_1000.robot_command","browseOnly":false,"creationTime":1615226846214,"deliveringCount":0},...
> {noformat}
> Afterwards, our monitor process goes on closing the sessions of the consumers with the old connection ID, although they were actually used by the new connection. This effectively breaks the connection of the artemis client.
> h3. The bug
> After looking through the code, we found a problem during session reattachment: In "internaltransferConnection" of "ServerSessionPacketHandler", the session is transferred from the old to the new connection. It seems mostly correct as the new connection is set both on the channel and on the ServerSessionPacketHandler itself. But on the "session" member variable (an instance of ServerSessionImpl), the new connection is not set, although ServerSessionImpl has a field "remotingConnection". In fact, the remotingConnectionField is only set once in the constructor, which cannot be correct if the ServerSessionImpl object is reused from the old to the new connection.
> Among other usages, the remotingConnection field of ServerSessionImpl is used to retrieve the connectionID of the session's connection, which in the end is used by the JMX server control (ActiveMQServerControlImpl) in order to build a JSON representation of a consumer.
> h3. A simple solution
> The most direct solution would be to set the new connection on the remotingConnection field of ServerSessionImpl. I will soon post a pull request here containing this change.
> There are some considerations though:
> - Do we need to care about locking when setting the field?
> - The "remotingConnection" field is currently final in ServerSessionImpl, and always was since the initial HornetQ donation, I'm not sure what the intention behind it was. But on the other hand, the session reattachment also worked the same way, so the bug of keeping the old connection on the ServerSessionImpl also existed back then.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)