You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Keith Wall (JIRA)" <ji...@apache.org> on 2012/05/22 17:40:42 UTC

[jira] [Comment Edited] (QPID-3912) Client failover fails to reconnect if a previous attempted reconnection has failed 'late' in the connection start process.

    [ https://issues.apache.org/jira/browse/QPID-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281010#comment-13281010 ] 

Keith Wall edited comment on QPID-3912 at 5/22/12 3:40 PM:
-----------------------------------------------------------

There is a second dimension to this defect which affects 0-8..0-9-1 only.

It becomes apparent if the failover parameters include a short connectdelay=x parameter.  If x is short (<200ms) on my box, and a connection fails, there is a race condition between which means that the state transition from the previous connection (CLOSING_CONNECTION => CLOSED_CONNECTION) can occur whilst the main thread is trying to reconnect (AMQConnectionDelegate_8_0.makeBrokerConnection).

This problem manifests itself in a couple of ways:

1) "CRAM-MD5 authentication already completed".   Here the above problem effectively allows a loop to form in the client which emits a stream of ProtocolInitiation messages down the wire.    The Broker replies to each with a ConnectionStart, and this goes on to confuse the SASL authentication on the client.  It ends with exception:

{code}
java.lang.IllegalStateException: CRAM-MD5 authentication already completed
        at com.sun.security.sasl.CramMD5Client.evaluateChallenge(CramMD5Client.java:75)
        at org.apache.qpid.client.handler.ConnectionSecureMethodHandler.methodReceived(ConnectionSecureMethodHandler.java:55)
        at org.apache.qpid.client.handler.ClientMethodDispatcherImpl.dispatchConnectionSecure(ClientMethodDispatcherImpl.java:216)
        at org.apache.qpid.framing.amqp_0_91.ConnectionSecureBodyImpl.execute(ConnectionSecureBodyImpl.java:110)
        at org.apache.qpid.client.state.AMQStateManager.methodReceived(AMQStateManager.java:114)
        at org.apache.qpid.client.protocol.AMQProtocolHandler.methodBodyReceived(AMQProtocolHandler.java:479)
        at org.apache.qpid.client.protocol.AMQProtocolSession.methodFrameReceived(AMQProtocolSession.java:456)
        at org.apache.qpid.framing.AMQMethodBodyImpl.handle(AMQMethodBodyImpl.java:97)
        at org.apache.qpid.client.protocol.AMQProtocolHandler.received(AMQProtocolHandler.java:436)
        at org.apache.qpid.client.protocol.AMQProtocolHandler.received(AMQProtocolHandler.java:121)
        at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:152)
{code}

2) OutOfMemoryError/Unsupported frame type: 10

{code}
IoReceiver - localhost/127.0.0.1:10000 2012-05-21 16:11:59,551 DEBUG [apache.qpid.client.AMQConnection] exceptionReceived done by:IoReceiver - localhost/127.0.0.1:10000
java.lang.OutOfMemoryError: Java heap space
        at org.apache.qpid.framing.EncodingUtils.readBytes(EncodingUtils.java:941)
        at org.apache.qpid.framing.AMQMethodBodyImpl.readBytes(AMQMethodBodyImpl.java:186)
        at org.apache.qpid.framing.amqp_0_91.ConnectionStartBodyImpl.<init>(ConnectionStartBodyImpl.java:77)
        at org.apache.qpid.framing.amqp_0_91.ConnectionStartBodyImpl$1.newInstance(ConnectionStartBodyImpl.java:44)
        at org.apache.qpid.framing.amqp_0_91.MethodRegistry_0_91.convertToBody(MethodRegistry_0_91.java:214)
        at org.apache.qpid.framing.AMQMethodBodyFactory.createBody(AMQMethodBodyFactory.java:44)
        at org.apache.qpid.framing.AMQFrame.<init>(AMQFrame.java:45)
        at org.apache.qpid.framing.AMQDataBlockDecoder.createAndPopulateFrame(AMQDataBlockDecoder.java:99)
        at org.apache.qpid.codec.AMQDecoder.decodeBuffer(AMQDecoder.java:250)
        at org.apache.qpid.client.protocol.AMQProtocolHandler.received(AMQProtocolHandler.java:408)
        at org.apache.qpid.client.protocol.AMQProtocolHandler.received(AMQProtocolHandler.java:121)
        at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:152)
        at java.lang.Thread.run(Thread.java:662)
IoReceiver - localhost/127.0.0.1:10000 2012-05-21 16:11:59,551 ERROR [qpid.client.protocol.AMQProtocolHandler] Exception processing frame
org.apache.qpid.framing.AMQFrameDecodingException: Unsupported frame type: 10
        at org.apache.qpid.framing.AMQDataBlockDecoder.createAndPopulateFrame(AMQDataBlockDecoder.java:86)
        at org.apache.qpid.codec.AMQDecoder.decodeBuffer(AMQDecoder.java:250)
        at org.apache.qpid.client.protocol.AMQProtocolHandler.received(AMQProtocolHandler.java:408)
        at org.apache.qpid.client.protocol.AMQProtocolHandler.received(AMQProtocolHandler.java:121)
        at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:152)
        at java.lang.Thread.run(Thread.java:662)
{code}
                
      was (Author: k-wall):
    There is a second dimension to this defect which affects 0-8..0-9-1 only.

It becomes apparent if the failover parameters include a short connectdelay=x parameter.  If x is short (<200ms) on my box, and a connection fails, there is a race condition between which means that the state transition from the previous connection (CLOSING_CONNECTION => CLOSED_CONNECTION) can occur whilst the main thread is trying to reconnect (AMQConnectionDelegate_8_0.makeBrokerConnection).  

                  
> Client failover fails to reconnect if a previous attempted reconnection has failed 'late' in the connection start process.
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: QPID-3912
>                 URL: https://issues.apache.org/jira/browse/QPID-3912
>             Project: Qpid
>          Issue Type: Bug
>          Components: Java Client
>    Affects Versions: 0.17
>            Reporter: Keith Wall
>            Assignee: Keith Wall
>            Priority: Minor
>             Fix For: 0.17
>
>
> A client uses failover to allow their client to reconnect to a second broker in the event of failure of the primary. 
> There is a defect in the Qpid Java client's failover code that means if an attempted reconnection fails 'late' in the connection start process, then the AMQConnection _closed flag get set permanently to true and this prevents all future use of the AMQConnection object, even after a successful reconnection.  By 'late' I mean a failure after the TCP/IP connection has been successfully established - such as an authentication or authorisation problem that causes the Broker to decide to close the connection.
> The problem affects both 0-10 and 0-8..0-9-1 code paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org