You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Steven Shaw (JIRA)" <qp...@incubator.apache.org> on 2006/11/30 20:24:20 UTC

[jira] Created: (QPID-144) Potential deadlocks during failover

Potential deadlocks during failover
-----------------------------------

                 Key: QPID-144
                 URL: http://issues.apache.org/jira/browse/QPID-144
             Project: Qpid
          Issue Type: Bug
          Components: Dot Net Client, Java Client
            Reporter: Steven Shaw


There's a certain need for "failover safety" in the implemenation of public client api methods. Any method that blocks for a response frame should be wrapped in a FailoverSupport. FailoverSupport automates the retrying after catching a FailoverException (a RuntimeException).

Methods that block waiting for a response frame are now easier to identify because they all call AMQProtocolHandler.syncWrite() (SyncWrite in the .NET client)

Currently the only methods employing FailoverSupport are AMQConnection.createSession, AMQSession.createConsumerImpl and createProducerImpl.

AMQConnection.createSession has 3 calls to syncWrite so certainly needs to be wrapped in FailoverSupport. No problem there.

AMQSession.createConsumerImpl/createProducerImpl neither call syncWrite. Unless there is some other important way in which they block, they don't really need to be wrapped in the FailoverSupport. It does no harm however.

The following methods use syncWrite() but are not wrapped in a FailoverSupport:
  AMQSession's commit(), rollback(), close()
  AMQConnection.close() via AMQProtocolHandler.closeConnection()
  BasicMessageConsumer.close()
These need to be protected/wrapped in a FailoverSupport. Note that commit() and rollback() are not currently protected by a lock on failoverMutex either.

Perhaps StateManager.attainState is the only other method that blocks for "a response frame". In this case a series of response frames that result in the state changing. The only use of attainState is in AMQConnection.makeBrokerConnection. It would appear to need to be wrapped in a FailoverSupport as otherwise the FailoverException will escape. Since this is failing-over during connection some care may be required. Note that the makeBrokerConnection is used at 3 different sites.

In addition sendAcknowledgement appear to need to lock the failoverMutex.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (QPID-144) Potential deadlocks during failover

Posted by "Martin Ritchie (JIRA)" <qp...@incubator.apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463500 ] 

Martin Ritchie commented on QPID-144:
-------------------------------------

The initial makeBrokerConnection's lack of FailoverSupport poses a problem during initial connection.

While the initial connection may fail failover will kick in however depending on thread ordering the initial connection request may have reached the attainState wait method. Which will receive a FailoverException and propogate this to the client as Failover occurs. 

see QPID-272
https://issues.apache.org/jira/browse/QPID-272

> Potential deadlocks during failover
> -----------------------------------
>
>                 Key: QPID-144
>                 URL: https://issues.apache.org/jira/browse/QPID-144
>             Project: Qpid
>          Issue Type: Bug
>          Components: Dot Net Client, Java Client
>            Reporter: Steven Shaw
>
> There's a certain need for "failover safety" in the implemenation of public client api methods. Any method that blocks for a response frame should be wrapped in a FailoverSupport. FailoverSupport automates the retrying after catching a FailoverException (a RuntimeException).
> Methods that block waiting for a response frame are now easier to identify because they all call AMQProtocolHandler.syncWrite() (SyncWrite in the .NET client)
> Currently the only methods employing FailoverSupport are AMQConnection.createSession, AMQSession.createConsumerImpl and createProducerImpl.
> AMQConnection.createSession has 3 calls to syncWrite so certainly needs to be wrapped in FailoverSupport. No problem there.
> AMQSession.createConsumerImpl/createProducerImpl neither call syncWrite. Unless there is some other important way in which they block, they don't really need to be wrapped in the FailoverSupport. It does no harm however.
> The following methods use syncWrite() but are not wrapped in a FailoverSupport:
>   AMQSession's commit(), rollback(), close()
>   AMQConnection.close() via AMQProtocolHandler.closeConnection()
>   BasicMessageConsumer.close()
> These need to be protected/wrapped in a FailoverSupport. Note that commit() and rollback() are not currently protected by a lock on failoverMutex either.
> Perhaps StateManager.attainState is the only other method that blocks for "a response frame". In this case a series of response frames that result in the state changing. The only use of attainState is in AMQConnection.makeBrokerConnection. It would appear to need to be wrapped in a FailoverSupport as otherwise the FailoverException will escape. Since this is failing-over during connection some care may be required. Note that the makeBrokerConnection is used at 3 different sites.
> In addition sendAcknowledgement appear to need to lock the failoverMutex.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (QPID-144) Potential deadlocks during failover

Posted by "Martin Ritchie (JIRA)" <qp...@incubator.apache.org>.
    [ http://issues.apache.org/jira/browse/QPID-144?page=comments#action_12454877 ] 
            
Martin Ritchie commented on QPID-144:
-------------------------------------

WRT: makeBrokerConnection 

The makeBrokerConnection does indeed pose a problem with failover but not on its own. The use of makeBrokerConnection will start the connection process that if it fails will cause the Failover Thread to handle connection. As can be seen in the AMQConnection constructor the majority of the method needs to be moved in to a connect() method as it lets the failover mechanism handle the failover issue.  (It does need to be improved as commented in a //todo there is a Thread.sleep loop that could be replaced with a wait() notify() mechanism.) 

The other two cases are related to attempting reconnection due to a redirection which are currently only called as part of the failover mechanism and so should already be wrapped in FailoverSupport

> Potential deadlocks during failover
> -----------------------------------
>
>                 Key: QPID-144
>                 URL: http://issues.apache.org/jira/browse/QPID-144
>             Project: Qpid
>          Issue Type: Bug
>          Components: Java Client, Dot Net Client
>            Reporter: Steven Shaw
>
> There's a certain need for "failover safety" in the implemenation of public client api methods. Any method that blocks for a response frame should be wrapped in a FailoverSupport. FailoverSupport automates the retrying after catching a FailoverException (a RuntimeException).
> Methods that block waiting for a response frame are now easier to identify because they all call AMQProtocolHandler.syncWrite() (SyncWrite in the .NET client)
> Currently the only methods employing FailoverSupport are AMQConnection.createSession, AMQSession.createConsumerImpl and createProducerImpl.
> AMQConnection.createSession has 3 calls to syncWrite so certainly needs to be wrapped in FailoverSupport. No problem there.
> AMQSession.createConsumerImpl/createProducerImpl neither call syncWrite. Unless there is some other important way in which they block, they don't really need to be wrapped in the FailoverSupport. It does no harm however.
> The following methods use syncWrite() but are not wrapped in a FailoverSupport:
>   AMQSession's commit(), rollback(), close()
>   AMQConnection.close() via AMQProtocolHandler.closeConnection()
>   BasicMessageConsumer.close()
> These need to be protected/wrapped in a FailoverSupport. Note that commit() and rollback() are not currently protected by a lock on failoverMutex either.
> Perhaps StateManager.attainState is the only other method that blocks for "a response frame". In this case a series of response frames that result in the state changing. The only use of attainState is in AMQConnection.makeBrokerConnection. It would appear to need to be wrapped in a FailoverSupport as otherwise the FailoverException will escape. Since this is failing-over during connection some care may be required. Note that the makeBrokerConnection is used at 3 different sites.
> In addition sendAcknowledgement appear to need to lock the failoverMutex.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (QPID-144) Potential deadlocks during failover

Posted by "John O'Hara (JIRA)" <qp...@incubator.apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468845 ] 

John O'Hara commented on QPID-144:
----------------------------------

Is the server telling the client that it is being disconnected because of Failover?

That's doesn't seen logical.
In production a client will need to failover when a Bad Thing happens on the server; the server most likely won't get a chance to send a Failover message to all its clients.
Failover is something that the client can try if it knows how to and is configured to, but otherwise it may not.

Also, the Qpid client could be run against say the RabbitMQ server; what happens then?

John



> Potential deadlocks during failover
> -----------------------------------
>
>                 Key: QPID-144
>                 URL: https://issues.apache.org/jira/browse/QPID-144
>             Project: Qpid
>          Issue Type: Bug
>          Components: Dot Net Client, Java Client
>            Reporter: Steven Shaw
>         Assigned To: Robert Greig
>
> There's a certain need for "failover safety" in the implemenation of public client api methods. Any method that blocks for a response frame should be wrapped in a FailoverSupport. FailoverSupport automates the retrying after catching a FailoverException (a RuntimeException).
> Methods that block waiting for a response frame are now easier to identify because they all call AMQProtocolHandler.syncWrite() (SyncWrite in the .NET client)
> Currently the only methods employing FailoverSupport are AMQConnection.createSession, AMQSession.createConsumerImpl and createProducerImpl.
> AMQConnection.createSession has 3 calls to syncWrite so certainly needs to be wrapped in FailoverSupport. No problem there.
> AMQSession.createConsumerImpl/createProducerImpl neither call syncWrite. Unless there is some other important way in which they block, they don't really need to be wrapped in the FailoverSupport. It does no harm however.
> The following methods use syncWrite() but are not wrapped in a FailoverSupport:
>   AMQSession's commit(), rollback(), close()
>   AMQConnection.close() via AMQProtocolHandler.closeConnection()
>   BasicMessageConsumer.close()
> These need to be protected/wrapped in a FailoverSupport. Note that commit() and rollback() are not currently protected by a lock on failoverMutex either.
> Perhaps StateManager.attainState is the only other method that blocks for "a response frame". In this case a series of response frames that result in the state changing. The only use of attainState is in AMQConnection.makeBrokerConnection. It would appear to need to be wrapped in a FailoverSupport as otherwise the FailoverException will escape. Since this is failing-over during connection some care may be required. Note that the makeBrokerConnection is used at 3 different sites.
> In addition sendAcknowledgement appear to need to lock the failoverMutex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (QPID-144) Potential deadlocks during failover

Posted by "Marnie McCormack (JIRA)" <qp...@incubator.apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marnie McCormack reassigned QPID-144:
-------------------------------------

    Assignee: Robert Greig

> Potential deadlocks during failover
> -----------------------------------
>
>                 Key: QPID-144
>                 URL: https://issues.apache.org/jira/browse/QPID-144
>             Project: Qpid
>          Issue Type: Bug
>          Components: Dot Net Client, Java Client
>            Reporter: Steven Shaw
>         Assigned To: Robert Greig
>
> There's a certain need for "failover safety" in the implemenation of public client api methods. Any method that blocks for a response frame should be wrapped in a FailoverSupport. FailoverSupport automates the retrying after catching a FailoverException (a RuntimeException).
> Methods that block waiting for a response frame are now easier to identify because they all call AMQProtocolHandler.syncWrite() (SyncWrite in the .NET client)
> Currently the only methods employing FailoverSupport are AMQConnection.createSession, AMQSession.createConsumerImpl and createProducerImpl.
> AMQConnection.createSession has 3 calls to syncWrite so certainly needs to be wrapped in FailoverSupport. No problem there.
> AMQSession.createConsumerImpl/createProducerImpl neither call syncWrite. Unless there is some other important way in which they block, they don't really need to be wrapped in the FailoverSupport. It does no harm however.
> The following methods use syncWrite() but are not wrapped in a FailoverSupport:
>   AMQSession's commit(), rollback(), close()
>   AMQConnection.close() via AMQProtocolHandler.closeConnection()
>   BasicMessageConsumer.close()
> These need to be protected/wrapped in a FailoverSupport. Note that commit() and rollback() are not currently protected by a lock on failoverMutex either.
> Perhaps StateManager.attainState is the only other method that blocks for "a response frame". In this case a series of response frames that result in the state changing. The only use of attainState is in AMQConnection.makeBrokerConnection. It would appear to need to be wrapped in a FailoverSupport as otherwise the FailoverException will escape. Since this is failing-over during connection some care may be required. Note that the makeBrokerConnection is used at 3 different sites.
> In addition sendAcknowledgement appear to need to lock the failoverMutex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (QPID-144) Potential deadlocks during failover

Posted by "Robert Greig (JIRA)" <qp...@incubator.apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Greig reassigned QPID-144:
---------------------------------

    Assignee: Rob Godfrey  (was: Robert Greig)

Is this still relevant?

> Potential deadlocks during failover
> -----------------------------------
>
>                 Key: QPID-144
>                 URL: https://issues.apache.org/jira/browse/QPID-144
>             Project: Qpid
>          Issue Type: Bug
>          Components: Dot Net Client, Java Client
>            Reporter: Steven Shaw
>            Assignee: Rob Godfrey
>
> There's a certain need for "failover safety" in the implemenation of public client api methods. Any method that blocks for a response frame should be wrapped in a FailoverSupport. FailoverSupport automates the retrying after catching a FailoverException (a RuntimeException).
> Methods that block waiting for a response frame are now easier to identify because they all call AMQProtocolHandler.syncWrite() (SyncWrite in the .NET client)
> Currently the only methods employing FailoverSupport are AMQConnection.createSession, AMQSession.createConsumerImpl and createProducerImpl.
> AMQConnection.createSession has 3 calls to syncWrite so certainly needs to be wrapped in FailoverSupport. No problem there.
> AMQSession.createConsumerImpl/createProducerImpl neither call syncWrite. Unless there is some other important way in which they block, they don't really need to be wrapped in the FailoverSupport. It does no harm however.
> The following methods use syncWrite() but are not wrapped in a FailoverSupport:
>   AMQSession's commit(), rollback(), close()
>   AMQConnection.close() via AMQProtocolHandler.closeConnection()
>   BasicMessageConsumer.close()
> These need to be protected/wrapped in a FailoverSupport. Note that commit() and rollback() are not currently protected by a lock on failoverMutex either.
> Perhaps StateManager.attainState is the only other method that blocks for "a response frame". In this case a series of response frames that result in the state changing. The only use of attainState is in AMQConnection.makeBrokerConnection. It would appear to need to be wrapped in a FailoverSupport as otherwise the FailoverException will escape. Since this is failing-over during connection some care may be required. Note that the makeBrokerConnection is used at 3 different sites.
> In addition sendAcknowledgement appear to need to lock the failoverMutex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (QPID-144) Potential deadlocks during failover

Posted by "Marnie McCormack (JIRA)" <qp...@incubator.apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marnie McCormack updated QPID-144:
----------------------------------

    Fix Version/s:     (was: M4)

Descoping items not being worked on for M4 into Unknown Fix Version for now

> Potential deadlocks during failover
> -----------------------------------
>
>                 Key: QPID-144
>                 URL: https://issues.apache.org/jira/browse/QPID-144
>             Project: Qpid
>          Issue Type: Bug
>          Components: Dot Net Client, Java Client
>            Reporter: Steven Shaw
>            Assignee: Rob Godfrey
>
> There's a certain need for "failover safety" in the implemenation of public client api methods. Any method that blocks for a response frame should be wrapped in a FailoverSupport. FailoverSupport automates the retrying after catching a FailoverException (a RuntimeException).
> Methods that block waiting for a response frame are now easier to identify because they all call AMQProtocolHandler.syncWrite() (SyncWrite in the .NET client)
> Currently the only methods employing FailoverSupport are AMQConnection.createSession, AMQSession.createConsumerImpl and createProducerImpl.
> AMQConnection.createSession has 3 calls to syncWrite so certainly needs to be wrapped in FailoverSupport. No problem there.
> AMQSession.createConsumerImpl/createProducerImpl neither call syncWrite. Unless there is some other important way in which they block, they don't really need to be wrapped in the FailoverSupport. It does no harm however.
> The following methods use syncWrite() but are not wrapped in a FailoverSupport:
>   AMQSession's commit(), rollback(), close()
>   AMQConnection.close() via AMQProtocolHandler.closeConnection()
>   BasicMessageConsumer.close()
> These need to be protected/wrapped in a FailoverSupport. Note that commit() and rollback() are not currently protected by a lock on failoverMutex either.
> Perhaps StateManager.attainState is the only other method that blocks for "a response frame". In this case a series of response frames that result in the state changing. The only use of attainState is in AMQConnection.makeBrokerConnection. It would appear to need to be wrapped in a FailoverSupport as otherwise the FailoverException will escape. Since this is failing-over during connection some care may be required. Note that the makeBrokerConnection is used at 3 different sites.
> In addition sendAcknowledgement appear to need to lock the failoverMutex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.