You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "Markus Meierhofer (Jira)" <ji...@apache.org> on 2020/08/11 06:58:00 UTC
[jira] [Comment Edited] (ARTEMIS-2870) CORE connection failure sometimes doesn't cleanup sessions

    [ https://issues.apache.org/jira/browse/ARTEMIS-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175292#comment-17175292 ] 

Markus Meierhofer edited comment on ARTEMIS-2870 at 8/11/20, 6:57 AM:
----------------------------------------------------------------------

[~jbertram] , to reproduce the issue I made a small script which blocks the communication between client and broker longer than TTL (60s) and then opens the connection again for a small timewindow (between 0.1-35s). With that I could reproduce the issue (but it takes quite a long time to see it).

This bug also seems to be linked to the issue that the client is not reconnecting properly. I made a ticket for that @ARTEMIS-2875
{code:java}
#!/bin/bash
sudo iptables -D OUTPUT -p tcp --destination-port 61616 -j DROP
sudo iptables -D INPUT -p tcp --destination-port 61616 -j DROP
sudo iptables -D OUTPUT -p tcp --destination-port 61616 -j DROP
sudo iptables -D INPUT -p tcp --destination-port 61616 -j DROP
sudo iptables -D OUTPUT -p tcp --destination-port 61616 -j DROP
sudo iptables -D INPUT -p tcp --destination-port 61616 -j DROP
sudo iptables -D OUTPUT -p tcp --destination-port 61616 -j DROP
sudo iptables -D INPUT -p tcp --destination-port 61616 -j DROP
while true; do 
        echo $(date)
        echo "start dropping messages for broker"
        sudo iptables -A OUTPUT -p tcp --destination-port 61616 -j DROP
        sudo iptables -A INPUT -p tcp --destination-port 61616 -j DROP
        sleeptime=$(python -c "import random;print random.uniform(60.0, 65.0)")
        echo "sleep for $sleeptime"
        sleep $sleeptime
        echo "stopping dropping messages for broker"
        sudo iptables -D OUTPUT -p tcp --destination-port 61616 -j DROP
        sudo iptables -D INPUT -p tcp --destination-port 61616 -j DROP
        sleeptime2=$(python -c "import random;print random.uniform(0.1, 35.0)")
        echo "sleep for $sleeptime2"
        sleep $sleeptime2
done

{code}


was (Author: mmeierhofer):
[~jbertram] , to reproduce the issue I made a small script which blocks the communication between client and broker longer than TTL (60s) and then opens the connection again for a small timewindow (between 0.1-35s). With that I could reproduce the issue (but it takes quite a long time to see it).

This bug also seems to be linked to the issue that the client is not reconnecting properly. I made a ticket for that @ARTEMIS-2875
{code:java}
#!/bin/bash
sudo iptables -D OUTPUT -p tcp --destination-port 61616 -j DROP
sudo iptables -D INPUT -p tcp --destination-port 61616 -j DROP
sudo iptables -D OUTPUT -p tcp --destination-port 61616 -j DROP
sudo iptables -D INPUT -p tcp --destination-port 61616 -j DROP
sudo iptables -D OUTPUT -p tcp --destination-port 61616 -j DROP
sudo iptables -D INPUT -p tcp --destination-port 61616 -j DROP
sudo iptables -D OUTPUT -p tcp --destination-port 61616 -j DROP
sudo iptables -D INPUT -p tcp --destination-port 61616 -j DROPwhile true; do 
        echo $(date)
        echo "start dropping messages for broker"
        sudo iptables -A OUTPUT -p tcp --destination-port 61616 -j DROP
        sudo iptables -A INPUT -p tcp --destination-port 61616 -j DROP
        sleeptime=$(python -c "import random;print random.uniform(60.0, 65.0)")
        echo "sleep for $sleeptime"
        sleep $sleeptime
        echo "stopping dropping messages for broker"
        sudo iptables -D OUTPUT -p tcp --destination-port 61616 -j DROP
        sudo iptables -D INPUT -p tcp --destination-port 61616 -j DROP
        sleeptime2=$(python -c "import random;print random.uniform(0.1, 35.0)")
        echo "sleep for $sleeptime2"
        sleep $sleeptime2
done

{code}

> CORE connection failure sometimes doesn't cleanup sessions
> ----------------------------------------------------------
>
>                 Key: ARTEMIS-2870
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2870
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.10.1, 2.14.0
>            Reporter: Markus Meierhofer
>            Priority: Critical
>         Attachments: artemis.log, broker.xml, duplicated consumers.png
>
>
> h3. Summary
> Since the upgrade of our deployed artemis instances from version 2.6.4 to 2.10.1 we have noticed the problem that sometimes, a connection failure doesn't include the cleanup of its connected sessions, leading to "zombie" consumers and producers on queues.
>  
> h3. The issue
> Our Artemis Clients are connected to the broker via the provided JMS abstraction, using the default connection TTL of 60 seconds. we are using both JMS Topics and JMS Queues.
> As most of our Clients are mobile and in a WiFi, connection losses may occur frequently, depending on the quality of the network. When the client is disconnected for 60 seconds, the broker usually closes the connection and cleans up all the sessions connected to it. The mobile Clients then create reconnect when they are online again. What we have noticed is that after many connection failures, messages may to be sent twice to the mobile clients. When analyzing the problem on the broker console, we found out that there were two consumers connected to each of the queues one mobile client usually consumes from. One of them belonged to the new connection of the mobile Client, which is fine.
> The other consumer belonged to a session whose connection already failed and was closed at that time. When analyzing the logs, we saw that for these connections, it contained a "Connection failure to ... has been detected" line, but no following "clearing up resources for session ..." log lines for these connections.
>  
> h3. Instance of the issue
>  
> The broken Session is the "7a9292cb-xxx" in the picture. In the logs you can see that the connection failure was detected, but the session was never cleared by the broker (mind the timestamp).
> !duplicated consumers.png!
> {code:java}
> [WARN 2020-07-27 14:33:29,794  Thread-13  org.apache.activemq.artemis.core.client]: AMQ212037: Connection failure to /10.255.0.2:54812 has been detected: syscall:read(..) failed: Connection reset by peer [code=GENERIC_EXCEPTION]
> [WARN 2020-07-29 09:31:30,828 Thread-20   org.apache.activemq.artemis.core.client]: AMQ212037: Connection failure to /10.255.0.2:55994 has been detected: AMQ229014: Did not receive data from /10.255.0.2:55994 within the 60,000ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
> {code}
>  
> Attached you can find the full [^artemis.log] and our [^broker.xml]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)