You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@activemq.apache.org by "Manuel Teira (JIRA)" <ji...@apache.org> on 2007/10/18 10:14:23 UTC

[jira] Commented: (AMQ-1146) InactivityMonitor: Transport connection disconnected "Channel was inactive for too long."

    [ https://issues.apache.org/activemq/browse/AMQ-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_40405 ] 

Manuel Teira commented on AMQ-1146:
-----------------------------------

I'm having some issues where slow performance in the backend database seems to lead to InactivityIOExceptions. My opinion is that the fix for this issue is causing those problems: 

Having in mind that the design of the readCheck/writeCheck monitors is that we always should have a command among two readCheck (produced by normal activity or by the response of a KeepAlive injected by the writeCheck between two readChecks if needed), synchronizing on the readChecker variable seems a bad idea:

If the chain of transportListener.onCommand() is lasting more than the negociated MaxInactivityDuration, we will end with two readCheck executing one after the other, without a writeCheck between them. The first readCheck success, sets the commandReceived to false, and the second one is always going to fail, as no new command has been processed, nor a writeCheck has been executed since the last readCheck.

I don't see the need of synchronizing on the readChecker/writeChecker variables. These are AtomicBooleans and should be safe to access them from different threads (this seemed to be the motivation of the fix). Also, synchronizing the readCheck/writeCheck is actually delaying the execution of the check, and we cannot guarantee anymore the alternating of writeCheck readCheck execution, that is needed for the solution to work.

Also, synchronizing that is avoiding this code in readCheck() :

        synchronized (readChecker) {
            if (inReceive.get()) {
                LOG.trace("A receive is in progress");
                return;
            }


to be reached: because inReceive is only set during onCommand, and that is synchronized on readChecker also. Furthermore, the right way to do this is returning here when a onCommand is performed at the same time, and not waiting, to not break the sequence of readCheck/writeCheck invocations.


In few words, I think that this change breaks the InactivityMonitor, and my opinion is that it should be rolled back. Also in the 4.1 subtask that I opened some time ago. :-(

Any comments?



> InactivityMonitor: Transport connection disconnected "Channel was inactive for too long."
> -----------------------------------------------------------------------------------------
>
>                 Key: AMQ-1146
>                 URL: https://issues.apache.org/activemq/browse/AMQ-1146
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 4.1.0
>         Environment: Windows Server 2003 running broker and Windows XP running client. Java 1.5.0_08 
>            Reporter: Mari
>            Assignee: Hiram Chirino
>            Priority: Minor
>             Fix For: 5.0.0
>
>         Attachments: InactivityMonitor_changed.java, InactivityMonitor_Original.java
>
>   Original Estimate: 1 hour
>  Remaining Estimate: 1 hour
>
> Sometimes (when the transport is idle for long) the transport gets disconnected with the message "Channel was inactive for too long." 
>         Apparently this is same as reported in http://www.nabble.com/Channel-was-inactive-for-too-long-t1463069.html#a3971744 and http://www.nabble.com/Durable-consumer-reconnect-problem-tf1716817.html#a6147014 
> Version: 4.1 
> Background: 
>         The class org.apache.activemq.transport.InactivityMonitor runs monitoring threads to check the read and write of the transport(TCP for example). If it hasn't sent the message during the given period, it sends a KeepAliveInfo to ensure the connectivity. If it doesn't receive the message for a given duration, then it disconnects the transport. 
>         By default, the maximumInactivityDuration is 30000 milliseconds. This is the time interval between the check for read. That is if an end of the transport doesn't receive any message for this period, then it will close the connection. For the read check, the time interval is half of maximumInactivityDuration i.e. 15000 millisecond. That is if a message was not sent during this period, it will send a KeepAliveInfo to ensure the connectivity. 
> Problem and the recommended fix: 
>         The InactivityMonitor class uses the flags inReceive and commandReceived variables in methods readCheck and onCommand. Typically these two methods are called from different threads. Also, the flags inSend and commandSent are used in methods oneway and writeCheck. Again these two methods are called from different threads. But, the synchronization was missing for these. This seems to be a potential cause for the problem. So, synchronization has been incorporated for these flag usages. 
>         The same class InactivityMonitor is used at both client and server side. In general, the write check (which sends KeepAliveInfo) happens twice during the time when the read check happens once. Probably the client and server machines have performance differences and just to be safer, now the write check is made to happen thrice during the time when the read check happens once. 
>         The original and the changed source files are attached here. 
> http://www.nabble.com/file/5958/InactivityMonitor_changed.java
> http://www.nabble.com/file/5957/InactivityMonitor_Original.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.