You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@activemq.apache.org by "Martin Serrano (Created) (JIRA)" <ji...@apache.org> on 2012/02/15 21:13:00 UTC

[jira] [Created] (AMQ-3719) Non failing IOException causes FailoverTransport to hang until real failure occurs

Non failing IOException causes FailoverTransport to hang until real failure occurs
----------------------------------------------------------------------------------

                 Key: AMQ-3719
                 URL: https://issues.apache.org/jira/browse/AMQ-3719
             Project: ActiveMQ
          Issue Type: Bug
          Components: Transport
         Environment: Intel(R) Core(TM) i5 CPU M 540 @2.53GHz
8 GB, 64-bit
            Reporter: Martin Serrano
            Priority: Critical
             Fix For: 5.6.0


I have only encountered this failure when the broker is experiencing heavy load and a new connection attempt is made.

* The FailoverTransport tracks commands that have been issued so that it can restore the state upon a failure/reconnect event.
* If an IOException occurs when sending a tracked command, the oneway() method returns, assuming that the IOException is indicative of a transport failure and will result in a failure/reconnect event.
* Some IOExceptions (like WireFormatNegotiation timesouts) are not always indicative of transport failure however.  In this case since no subsequent failure/reconnect event occurs, the command will never be resent.  If this is a synchronous command (like that generated by starting a connection) the calling thread will hang.

Incidentally, my reading of the code is that only non-tracked commands can generate the IOException that triggers the handleTransportFailure command.  Is that what we really want?  

My belief is that the IOExceptions should always result in the triggering of the handleTransportFailure, regardless of origin.

I will attach a unit test and fix shortly.  The test will often fail (i.e. hang) without the fix, but not always since I use a wireFormat.maxInactivityDurationInitalDelay=1 option to trigger the behavior.  If the system runs fast enough, it sometimes will not get the timeout.  I wasn't sure exactly how such a test should be written... or if the test environment has controls to prevent hanging tests (in case of regression) from hanging a build.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (AMQ-3719) Tracked command IOException causes FailoverTransport to hang until failure occurs for untracked command

Posted by "Martin Serrano (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AMQ-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Serrano updated AMQ-3719:
--------------------------------

    Summary: Tracked command IOException causes FailoverTransport to hang until failure occurs for untracked command  (was: Non failing IOException causes FailoverTransport to hang until real failure occurs)
    
> Tracked command IOException causes FailoverTransport to hang until failure occurs for untracked command
> -------------------------------------------------------------------------------------------------------
>
>                 Key: AMQ-3719
>                 URL: https://issues.apache.org/jira/browse/AMQ-3719
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Transport
>         Environment: Intel(R) Core(TM) i5 CPU M 540 @2.53GHz
> 8 GB, 64-bit
>            Reporter: Martin Serrano
>            Priority: Critical
>             Fix For: 5.6.0
>
>         Attachments: amq-3719.patch
>
>
> I have only encountered this failure when the broker is experiencing heavy load and a new connection attempt is made.
> * The FailoverTransport tracks commands that have been issued so that it can restore the state upon a failure/reconnect event.
> * If an IOException occurs when sending a tracked command, the oneway() method returns, assuming that the IOException is indicative of a transport failure and will result in a failure/reconnect event.
> * Some IOExceptions (like WireFormatNegotiation timesouts) are not always indicative of transport failure however.  In this case since no subsequent failure/reconnect event occurs, the command will never be resent.  If this is a synchronous command (like that generated by starting a connection) the calling thread will hang.
> Incidentally, my reading of the code is that only non-tracked commands can generate the IOException that triggers the handleTransportFailure command.  Is that what we really want?  
> My belief is that the IOExceptions should always result in the triggering of the handleTransportFailure, regardless of origin.
> I will attach a unit test and fix shortly.  The test will often fail (i.e. hang) without the fix, but not always since I use a wireFormat.maxInactivityDurationInitalDelay=1 option to trigger the behavior.  If the system runs fast enough, it sometimes will not get the timeout.  I wasn't sure exactly how such a test should be written...The test will fail if connection does not succeed within 60s

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (AMQ-3719) Non failing IOException causes FailoverTransport to hang until real failure occurs

Posted by "Martin Serrano (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AMQ-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Serrano updated AMQ-3719:
--------------------------------

    Description: 
I have only encountered this failure when the broker is experiencing heavy load and a new connection attempt is made.

* The FailoverTransport tracks commands that have been issued so that it can restore the state upon a failure/reconnect event.
* If an IOException occurs when sending a tracked command, the oneway() method returns, assuming that the IOException is indicative of a transport failure and will result in a failure/reconnect event.
* Some IOExceptions (like WireFormatNegotiation timesouts) are not always indicative of transport failure however.  In this case since no subsequent failure/reconnect event occurs, the command will never be resent.  If this is a synchronous command (like that generated by starting a connection) the calling thread will hang.

Incidentally, my reading of the code is that only non-tracked commands can generate the IOException that triggers the handleTransportFailure command.  Is that what we really want?  

My belief is that the IOExceptions should always result in the triggering of the handleTransportFailure, regardless of origin.

I will attach a unit test and fix shortly.  The test will often fail (i.e. hang) without the fix, but not always since I use a wireFormat.maxInactivityDurationInitalDelay=1 option to trigger the behavior.  If the system runs fast enough, it sometimes will not get the timeout.  I wasn't sure exactly how such a test should be written...The test will fail if connection does not succeed within 60s

  was:
I have only encountered this failure when the broker is experiencing heavy load and a new connection attempt is made.

* The FailoverTransport tracks commands that have been issued so that it can restore the state upon a failure/reconnect event.
* If an IOException occurs when sending a tracked command, the oneway() method returns, assuming that the IOException is indicative of a transport failure and will result in a failure/reconnect event.
* Some IOExceptions (like WireFormatNegotiation timesouts) are not always indicative of transport failure however.  In this case since no subsequent failure/reconnect event occurs, the command will never be resent.  If this is a synchronous command (like that generated by starting a connection) the calling thread will hang.

Incidentally, my reading of the code is that only non-tracked commands can generate the IOException that triggers the handleTransportFailure command.  Is that what we really want?  

My belief is that the IOExceptions should always result in the triggering of the handleTransportFailure, regardless of origin.

I will attach a unit test and fix shortly.  The test will often fail (i.e. hang) without the fix, but not always since I use a wireFormat.maxInactivityDurationInitalDelay=1 option to trigger the behavior.  If the system runs fast enough, it sometimes will not get the timeout.  I wasn't sure exactly how such a test should be written... or if the test environment has controls to prevent hanging tests (in case of regression) from hanging a build.

    
> Non failing IOException causes FailoverTransport to hang until real failure occurs
> ----------------------------------------------------------------------------------
>
>                 Key: AMQ-3719
>                 URL: https://issues.apache.org/jira/browse/AMQ-3719
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Transport
>         Environment: Intel(R) Core(TM) i5 CPU M 540 @2.53GHz
> 8 GB, 64-bit
>            Reporter: Martin Serrano
>            Priority: Critical
>             Fix For: 5.6.0
>
>
> I have only encountered this failure when the broker is experiencing heavy load and a new connection attempt is made.
> * The FailoverTransport tracks commands that have been issued so that it can restore the state upon a failure/reconnect event.
> * If an IOException occurs when sending a tracked command, the oneway() method returns, assuming that the IOException is indicative of a transport failure and will result in a failure/reconnect event.
> * Some IOExceptions (like WireFormatNegotiation timesouts) are not always indicative of transport failure however.  In this case since no subsequent failure/reconnect event occurs, the command will never be resent.  If this is a synchronous command (like that generated by starting a connection) the calling thread will hang.
> Incidentally, my reading of the code is that only non-tracked commands can generate the IOException that triggers the handleTransportFailure command.  Is that what we really want?  
> My belief is that the IOExceptions should always result in the triggering of the handleTransportFailure, regardless of origin.
> I will attach a unit test and fix shortly.  The test will often fail (i.e. hang) without the fix, but not always since I use a wireFormat.maxInactivityDurationInitalDelay=1 option to trigger the behavior.  If the system runs fast enough, it sometimes will not get the timeout.  I wasn't sure exactly how such a test should be written...The test will fail if connection does not succeed within 60s

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (AMQ-3719) Non failing IOException causes FailoverTransport to hang until real failure occurs

Posted by "Martin Serrano (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AMQ-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Serrano updated AMQ-3719:
--------------------------------

    Attachment: amq-3719.patch

test and patch for bug attached
                
> Non failing IOException causes FailoverTransport to hang until real failure occurs
> ----------------------------------------------------------------------------------
>
>                 Key: AMQ-3719
>                 URL: https://issues.apache.org/jira/browse/AMQ-3719
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Transport
>         Environment: Intel(R) Core(TM) i5 CPU M 540 @2.53GHz
> 8 GB, 64-bit
>            Reporter: Martin Serrano
>            Priority: Critical
>             Fix For: 5.6.0
>
>         Attachments: amq-3719.patch
>
>
> I have only encountered this failure when the broker is experiencing heavy load and a new connection attempt is made.
> * The FailoverTransport tracks commands that have been issued so that it can restore the state upon a failure/reconnect event.
> * If an IOException occurs when sending a tracked command, the oneway() method returns, assuming that the IOException is indicative of a transport failure and will result in a failure/reconnect event.
> * Some IOExceptions (like WireFormatNegotiation timesouts) are not always indicative of transport failure however.  In this case since no subsequent failure/reconnect event occurs, the command will never be resent.  If this is a synchronous command (like that generated by starting a connection) the calling thread will hang.
> Incidentally, my reading of the code is that only non-tracked commands can generate the IOException that triggers the handleTransportFailure command.  Is that what we really want?  
> My belief is that the IOExceptions should always result in the triggering of the handleTransportFailure, regardless of origin.
> I will attach a unit test and fix shortly.  The test will often fail (i.e. hang) without the fix, but not always since I use a wireFormat.maxInactivityDurationInitalDelay=1 option to trigger the behavior.  If the system runs fast enough, it sometimes will not get the timeout.  I wasn't sure exactly how such a test should be written...The test will fail if connection does not succeed within 60s

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AMQ-3719) Tracked command IOException causes FailoverTransport to hang until failure occurs for untracked command

Posted by "Timothy Bish (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AMQ-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226900#comment-13226900 ] 

Timothy Bish commented on AMQ-3719:
-----------------------------------

This does appear to be an issue.  I don't think the patch is quite correct since it doesn't hop out of the redelivery attempt loop inside of oneway though which would cause the same tracked command to be attempted a second time which is shouldn't be since its already in the state tracker.  Perhaps the correct thing here is simply an else clause on the if (tracked == null) that calls the handleTransportFailure and then allows the method to return as usual.  
                
> Tracked command IOException causes FailoverTransport to hang until failure occurs for untracked command
> -------------------------------------------------------------------------------------------------------
>
>                 Key: AMQ-3719
>                 URL: https://issues.apache.org/jira/browse/AMQ-3719
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Transport
>         Environment: Intel(R) Core(TM) i5 CPU M 540 @2.53GHz
> 8 GB, 64-bit
>            Reporter: Martin Serrano
>            Priority: Critical
>             Fix For: 5.6.0
>
>         Attachments: amq-3719.patch
>
>
> I have only encountered this failure when the broker is experiencing heavy load and a new connection attempt is made.
> * The FailoverTransport tracks commands that have been issued so that it can restore the state upon a failure/reconnect event.
> * If an IOException occurs when sending a tracked command, the oneway() method returns, assuming that the IOException is indicative of a transport failure and will result in a failure/reconnect event.
> * Some IOExceptions (like WireFormatNegotiation timesouts) are not always indicative of transport failure however.  In this case since no subsequent failure/reconnect event occurs, the command will never be resent.  If this is a synchronous command (like that generated by starting a connection) the calling thread will hang.
> Incidentally, my reading of the code is that only non-tracked commands can generate the IOException that triggers the handleTransportFailure command.  Is that what we really want?  
> My belief is that the IOExceptions should always result in the triggering of the handleTransportFailure, regardless of origin.
> I will attach a unit test and fix shortly.  The test will often fail (i.e. hang) without the fix, but not always since I use a wireFormat.maxInactivityDurationInitalDelay=1 option to trigger the behavior.  If the system runs fast enough, it sometimes will not get the timeout.  I wasn't sure exactly how such a test should be written...The test will fail if connection does not succeed within 60s

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (AMQ-3719) Tracked command IOException causes FailoverTransport to hang until failure occurs for untracked command

Posted by "Timothy Bish (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AMQ-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Timothy Bish resolved AMQ-3719.
-------------------------------

    Resolution: Fixed
      Assignee: Timothy Bish

Fixed on trunk, thanks for doing the leg work on this one.
                
> Tracked command IOException causes FailoverTransport to hang until failure occurs for untracked command
> -------------------------------------------------------------------------------------------------------
>
>                 Key: AMQ-3719
>                 URL: https://issues.apache.org/jira/browse/AMQ-3719
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Transport
>         Environment: Intel(R) Core(TM) i5 CPU M 540 @2.53GHz
> 8 GB, 64-bit
>            Reporter: Martin Serrano
>            Assignee: Timothy Bish
>            Priority: Critical
>             Fix For: 5.6.0
>
>         Attachments: amq-3719.patch
>
>
> I have only encountered this failure when the broker is experiencing heavy load and a new connection attempt is made.
> * The FailoverTransport tracks commands that have been issued so that it can restore the state upon a failure/reconnect event.
> * If an IOException occurs when sending a tracked command, the oneway() method returns, assuming that the IOException is indicative of a transport failure and will result in a failure/reconnect event.
> * Some IOExceptions (like WireFormatNegotiation timesouts) are not always indicative of transport failure however.  In this case since no subsequent failure/reconnect event occurs, the command will never be resent.  If this is a synchronous command (like that generated by starting a connection) the calling thread will hang.
> Incidentally, my reading of the code is that only non-tracked commands can generate the IOException that triggers the handleTransportFailure command.  Is that what we really want?  
> My belief is that the IOExceptions should always result in the triggering of the handleTransportFailure, regardless of origin.
> I will attach a unit test and fix shortly.  The test will often fail (i.e. hang) without the fix, but not always since I use a wireFormat.maxInactivityDurationInitalDelay=1 option to trigger the behavior.  If the system runs fast enough, it sometimes will not get the timeout.  I wasn't sure exactly how such a test should be written...The test will fail if connection does not succeed within 60s

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira