You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@activemq.apache.org by "Kevin Yaussy (JIRA)" <ji...@apache.org> on 2006/06/15 14:43:51 UTC

[jira] Commented: (AMQ-443) ReliableTransport / KeepAlive algorithm does not work properly.

    [ https://issues.apache.org/activemq/browse/AMQ-443?page=comments#action_36397 ] 

Kevin Yaussy commented on AMQ-443:
----------------------------------

Yes - and so far the 4.0 approach is working very well in this respect.

> ReliableTransport / KeepAlive algorithm does not work properly.
> ---------------------------------------------------------------
>
>          Key: AMQ-443
>          URL: https://issues.apache.org/activemq/browse/AMQ-443
>      Project: ActiveMQ
>         Type: Bug

>   Components: Transport, Broker
>     Versions: 3.2, 3.2.1
>  Environment: Solaris 8 / 10.  JDK 1.5
>     Reporter: Kevin Yaussy
>      Fix For: 4.0
>  Attachments: KeepAliveDaemon.java, ReliableTransportChannel.java
>
>
> The current implementation of KeepAliveDaemon.java will sometimes force disconnections on well behaved connections.  The problem may arrise if there is a connection which goes away, and the KeepAlive send to that channel blocks while attempting to reconnect.  If this reconnection takes a while, then other channels that were responding fine may get their connections broken.  This happens due to the following code in KeepAliveDaemon.java:
> 		if ((channel.getLastReceiptTimestamp() + channel.getKeepAliveTimeout() * 2) < System.currentTimeMillis()) {
> or
> 		} else if ((channel.getLastReceiptTimestamp() + channel.getKeepAliveTimeout()) < System.currentTimeMillis()) {
> The fact that the receipt timestamp is checked against System.currentTimeMillis() causes the code to break otherwise good connections.  If a KeepAlive send (in examineChannel) for a broken channel takes longer than some good channel's KeepAliveTimeout, then the good connection gets broken.
> This can, in turn, cause some pretty bad behavior in the Broker.  While testing and diagnosing this problem, I could some brokers in a network of brokers stuck.  The sequence of events during recovery, which get interrupted due to closing the connections, would sometimes lead to the broker hanging waiting for a receipt, such as during an addConsumer (which eventually calls syncSendWithReceipt).
> I have redone the logic in KeepAliveDaemon.java (which required a small change to ReliableTransportChannel as well).  This now seems to work.
> I'm a bit concerned about the blocking calls, though.  This may be a different issue / bug.  I thought it looked like there was a mechanism to cancel outstanding receipt waiters - but, every once in a while that mechanism would not get called.  This results in the broker basically getting stuck, and does not ever really recover.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   https://issues.apache.org/activemq/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira