You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by rajith attapattu <ra...@gmail.com> on 2013/05/08 16:02:23 UTC

Re: Review Request: Address deadlock btw _messageDeliveryLock and _failoverMutex

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10738/
-----------------------------------------------------------

(Updated May 8, 2013, 2:02 p.m.)


Review request for qpid, Robbie Gemmell, Weston Price, and Rob Godfrey.


Changes
-------

The updated patch address the following in addition to the previous patch,

1. Prevents message delivery if the session is marked for close.
2. Prevents message delivery if the Consumer is marked for close.
3. In both session and consumer, now there is no chance a session/consumer would be closed while a delivery is in progress.
4. Removed the timeout in waitForMsgDeliveryToFinish. Now the method will block until a delivery completes (as per the JMS spec).


Description
-------

There are at least 3 cases where the deadlock btw _messageDeliveryLock and _failoverMutex surfaces. Among them sending a message inside onMessage() and the session being closed due to an error (causing the deadlock) seems to come up a lot in production environments. There is also a deadlock btw _messageDeliveryLock and _lock (AMQSession.java) which happens less frequently.
The messageDeliveryLock is used to ensure that we don't close the session in the middle of a message delivery. In order to do this we hold the lock across onMessage().
This causes several issues in addition to the potential to deadlock. If an onMessage call takes longer/wedged then you cannot close the session or failover will not happen until it returns as the same thread is holding the failoverMutex. 

Based on an idea from Rafi, I have come up with a solution to get rid of _messageDeliveryLock and instead use an alternative strategy to achieve similar functionality.
In order to ensure that close() doesn't proceed until the message deliveries currently in progress completes, an atomic counter is used to keep track of message deliveries in progress.
The close() will wait until the count falls to zero before proceeding. No new deliveries will be initiated bcos the close method will mark the session as closed.
The wait has a timeout to ensure that a longer running or wedged onMessage() will not hold up session close.
There is a slim chance that before a session being marked as closed a message delivery could be initiated, but not yet gotten to the point of updating the counter, hence waitForMsgDeliveryToFinish() will see it as zero and proceed with close. But in comparison to the issues with _messageDeliveryLock, I believe it's acceptable.

There is an issue if MessageConsumer close is called outside of Session close. This can be solved in a similar manner. I will wait until the current review is complete and then post the solution for the MessageConsumer close.
I will commit them both together.


This addresses bug QPID-4574.
    https://issues.apache.org/jira/browse/QPID-4574


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQConnection.java 1480271 
  http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java 1480271 
  http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java 1480271 
  http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/common/src/main/java/org/apache/qpid/configuration/ClientProperties.java 1480271 

Diff: https://reviews.apache.org/r/10738/diff/


Testing
-------

Java test suite, tests from customers and QE around the deadlock situation.


Thanks,

rajith attapattu


Re: Review Request: Address deadlock btw _messageDeliveryLock and _failoverMutex

Posted by Robbie Gemmell <ro...@gmail.com>.

> On May 13, 2013, 3:22 p.m., Robbie Gemmell wrote:
> > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java, lines 744-745
> > <https://reviews.apache.org/r/10738/diff/2/?file=288993#file288993line744>
> >
> >     What impact does dropping the message on the floor here have on the client?
> >     
> >     Earlier points in the call hierarchy seem to make effort to do other things with the message when detecting session/consumer close, so is there any impact from not doing so? E.g a message getting stuck acquired for a now-closed consumer?
> >     
> >     Does any particular attention need paid to the overridden 0-10 specific version of this method?
> 
> rajith attapattu wrote:
>     The message will be dropped if a consumer (or session is closed or closing).
>     When a consumer is closed, any messages acquired but not acknowledged should be be made available to another consumer by the broker.
>     These messages will be marked redelivered.
>     Is this the same situation for 0-8/0-9 ?
>     
>     The situation is the same as a consumer closing with a bunch of unacked messages when in CLIENT_ACK mode.
>     Alternatively we could reject the message (release in 0-10 terms). But I don't think this is required, given that we will be closing the consumer anyways.
>     
>     
>     >Earlier points in the call hierarchy seem to make effort to do other things with the message when detecting session/consumer close
>     By "other things" are you referring to a reject? In 0-10 AFIK you don't need to do it. 
>     The situation is the same as a consumer closing with a bunch of unacked messages when in CLIENT_ACK mode.
>     
>     >Does any particular attention need paid to the overridden 0-10 specific version of this method?
>     IMO adding "if (!(isClosed() || isClosing()))" is required to prevent the 0-10 specific method from issuing credit (and receiving more messages, that will eventually get released).
>     There is a chance where the consumer could be marked closed, but not yet reached the point of sending a cancel, by the time messageFlow() is called.
>     The above check should prevent it.
> 
> Robbie Gemmell wrote:
>     "Did you mean race condition (instead of deadlock)?"
>     
>     I really meant deadlock. One of the uses of _messageDeliveryLock being removed was added as part of the fix for a deadlock (https://issues.apache.org/jira/browse/QPID-3911) and so I wonder what effect removing it again will have. It may be the other changes mean it isn't a problem, but I think it needs to be closely considered. 
>     
>     "The message will be dropped if a consumer (or session is closed or closing).
>     When a consumer is closed, any messages acquired but not acknowledged should be be made available to another consumer by the broker.
>     These messages will be marked redelivered.
>     Is this the same situation for 0-8/0-9 ?"
>     
>     That isn't actually the case, transferred messages are the responsibility of the session and remain acquired after consumer close until such point as the session explicitly does something with them. I believe this is true of 0-8/9/91 as well, and probably explains the origin of the reject/release code I referred to earlier in the call tree than the changes would now allow the message to be silently dropped.
>     
>     From the 0-10 spec: "Canceling a subscription MUST NOT affect pending transfers. A transfer made prior to canceling transfers to the destination MUST be able to be accepted, released, acquired, or rejected after the subscription is canceled."
> 
> rajith attapattu wrote:
>     Indeed, Gordon mentioned that to me and the 2nd patch (the one before I did today) takes care of rejecting messages from the consumer. We don't need to do the same when the session is closing, as when the session ends, the any unacked messages are put back on the queue.
> 
> rajith attapattu wrote:
>     As for QPID-3911,
>     There is a deadlock, albeit a bit different (involving the same locks) from QPID-3911, that do happen in similar circumstances.
>     However this deadlock appears to manifest with or without this patch, which leads me to believe that _messageDeliveryLock is not the right solution for QPID-3911.
>     Sadly the solution for QPID-3911 made it worse as there are at least two distinct cases of deadlocks involving _messageDeliveryLock.
>     1. Btw _lock and _messageDeliveryLock
>     2. Btw _messageDeliveryLock and _failoverMutex.
>     
>     We definitely need to find a solution for the deadlocks (at least 3 cases) btw failoverMutex and _lock (in AMQSession), which seems to be the root of all evil :)
>     We might have to drop one lock (most likely _lock) and see if we can provide an alternative strategy to guarantee correct behaviour.
>
> 
> rajith attapattu wrote:
>     Sorry I meant dopping _failoverMutex (not _lock in AMQSession).
>     It might also be an opportunity to fix our less than stellar failover.

The deadlock itself wasn't really the main issue in QPID-3911, it was just a very obvious symptom that presented after the underlying problem had occurred, which was that the client sent illegal commands to the broker due to allowing things to occur at the same time (/in the wrong order) that it shouldn't had, namely suspending the session(/consumers) during session rollback inside onMessage() at the same time as a consumer close occurring on the session in another thread. The deliveryLock presented a route to stop that by preventing the possibility that the dispatcher would be inside user code at the time and thus make it unable to cause the problem.

As you say, while the changes fixed that problem it seems they introduced another. It so far appears to me like that is what using the current changes here would be doing, by returning the client to the state where it could be delivering messages on the session [for another consumer] that could allow rollback while a consumer close is in progress, but with the distinction that we would know in advance this time that there could be a problem with the change.

Having a quick look through the client code, it seems like an avenue you could pursue to fix the issue (for 0-10 at least) might be the AMQSession suspensionLock. It is only used inside AMQSession currently, in a way that looks like its acquisition wont be followed by executing user code or acquiring any more locks, and so at first glance looks like it could be safe to acquire during the consumer close as well in order to prevent the concurrent suspend+close that actually caused the problem for 0-10 in QPID-3911. That would only be part of the fix though as it would probably allow a consumer close to block the session rollback only for that to then proceed afterwards and cause the same illegal commands to be sent. To prevent this it seems like ensuring we don't operate on closing/closed consumers in AMQSession_0_10#sendSuspendChannel(..) would stop the client issuing the stop/flow commands for the now-closing/closed consumers. I haven't spent enough time looking at this to know if there are any side effects from doing that though, or if that would fix the related issues seen with 0-8/9/91 which were slightly different and didn't actually involve a deadlock but rather a command timeout, so such things would need investigated if you pursued the approach.


- Robbie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10738/#review20483
-----------------------------------------------------------


On May 21, 2013, 2:48 p.m., rajith attapattu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10738/
> -----------------------------------------------------------
> 
> (Updated May 21, 2013, 2:48 p.m.)
> 
> 
> Review request for qpid, Robbie Gemmell, Weston Price, and Rob Godfrey.
> 
> 
> Description
> -------
> 
> There are at least 3 cases where the deadlock btw _messageDeliveryLock and _failoverMutex surfaces. Among them sending a message inside onMessage() and the session being closed due to an error (causing the deadlock) seems to come up a lot in production environments. There is also a deadlock btw _messageDeliveryLock and _lock (AMQSession.java) which happens less frequently.
> The messageDeliveryLock is used to ensure that we don't close the session in the middle of a message delivery. In order to do this we hold the lock across onMessage().
> This causes several issues in addition to the potential to deadlock. If an onMessage call takes longer/wedged then you cannot close the session or failover will not happen until it returns as the same thread is holding the failoverMutex. 
> 
> Based on an idea from Rafi, I have come up with a solution to get rid of _messageDeliveryLock and instead use an alternative strategy to achieve similar functionality.
> In order to ensure that close() doesn't proceed until the message deliveries currently in progress completes, an atomic counter is used to keep track of message deliveries in progress.
> The close() will wait until the count falls to zero before proceeding. No new deliveries will be initiated bcos the close method will mark the session as closed.
> The wait has a timeout to ensure that a longer running or wedged onMessage() will not hold up session close.
> There is a slim chance that before a session being marked as closed a message delivery could be initiated, but not yet gotten to the point of updating the counter, hence waitForMsgDeliveryToFinish() will see it as zero and proceed with close. But in comparison to the issues with _messageDeliveryLock, I believe it's acceptable.
> 
> There is an issue if MessageConsumer close is called outside of Session close. This can be solved in a similar manner. I will wait until the current review is complete and then post the solution for the MessageConsumer close.
> I will commit them both together.
> 
> 
> This addresses bug QPID-4574.
>     https://issues.apache.org/jira/browse/QPID-4574
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQConnection.java 1484812 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java 1484812 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java 1484812 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer_0_10.java 1484812 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/util/ConditionManager.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/10738/diff/
> 
> 
> Testing
> -------
> 
> Java test suite, tests from customers and QE around the deadlock situation.
> 
> 
> Thanks,
> 
> rajith attapattu
> 
>


Re: Review Request: Address deadlock btw _messageDeliveryLock and _failoverMutex

Posted by rajith attapattu <ra...@gmail.com>.

> On May 13, 2013, 3:22 p.m., Robbie Gemmell wrote:
> > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java, lines 744-745
> > <https://reviews.apache.org/r/10738/diff/2/?file=288993#file288993line744>
> >
> >     What impact does dropping the message on the floor here have on the client?
> >     
> >     Earlier points in the call hierarchy seem to make effort to do other things with the message when detecting session/consumer close, so is there any impact from not doing so? E.g a message getting stuck acquired for a now-closed consumer?
> >     
> >     Does any particular attention need paid to the overridden 0-10 specific version of this method?
> 
> rajith attapattu wrote:
>     The message will be dropped if a consumer (or session is closed or closing).
>     When a consumer is closed, any messages acquired but not acknowledged should be be made available to another consumer by the broker.
>     These messages will be marked redelivered.
>     Is this the same situation for 0-8/0-9 ?
>     
>     The situation is the same as a consumer closing with a bunch of unacked messages when in CLIENT_ACK mode.
>     Alternatively we could reject the message (release in 0-10 terms). But I don't think this is required, given that we will be closing the consumer anyways.
>     
>     
>     >Earlier points in the call hierarchy seem to make effort to do other things with the message when detecting session/consumer close
>     By "other things" are you referring to a reject? In 0-10 AFIK you don't need to do it. 
>     The situation is the same as a consumer closing with a bunch of unacked messages when in CLIENT_ACK mode.
>     
>     >Does any particular attention need paid to the overridden 0-10 specific version of this method?
>     IMO adding "if (!(isClosed() || isClosing()))" is required to prevent the 0-10 specific method from issuing credit (and receiving more messages, that will eventually get released).
>     There is a chance where the consumer could be marked closed, but not yet reached the point of sending a cancel, by the time messageFlow() is called.
>     The above check should prevent it.
> 
> Robbie Gemmell wrote:
>     "Did you mean race condition (instead of deadlock)?"
>     
>     I really meant deadlock. One of the uses of _messageDeliveryLock being removed was added as part of the fix for a deadlock (https://issues.apache.org/jira/browse/QPID-3911) and so I wonder what effect removing it again will have. It may be the other changes mean it isn't a problem, but I think it needs to be closely considered. 
>     
>     "The message will be dropped if a consumer (or session is closed or closing).
>     When a consumer is closed, any messages acquired but not acknowledged should be be made available to another consumer by the broker.
>     These messages will be marked redelivered.
>     Is this the same situation for 0-8/0-9 ?"
>     
>     That isn't actually the case, transferred messages are the responsibility of the session and remain acquired after consumer close until such point as the session explicitly does something with them. I believe this is true of 0-8/9/91 as well, and probably explains the origin of the reject/release code I referred to earlier in the call tree than the changes would now allow the message to be silently dropped.
>     
>     From the 0-10 spec: "Canceling a subscription MUST NOT affect pending transfers. A transfer made prior to canceling transfers to the destination MUST be able to be accepted, released, acquired, or rejected after the subscription is canceled."
> 
> rajith attapattu wrote:
>     Indeed, Gordon mentioned that to me and the 2nd patch (the one before I did today) takes care of rejecting messages from the consumer. We don't need to do the same when the session is closing, as when the session ends, the any unacked messages are put back on the queue.

As for QPID-3911,
There is a deadlock, albeit a bit different (involving the same locks) from QPID-3911, that do happen in similar circumstances.
However this deadlock appears to manifest with or without this patch, which leads me to believe that _messageDeliveryLock is not the right solution for QPID-3911.
Sadly the solution for QPID-3911 made it worse as there are at least two distinct cases of deadlocks involving _messageDeliveryLock.
1. Btw _lock and _messageDeliveryLock
2. Btw _messageDeliveryLock and _failoverMutex.

We definitely need to find a solution for the deadlocks (at least 3 cases) btw failoverMutex and _lock (in AMQSession), which seems to be the root of all evil :)
We might have to drop one lock (most likely _lock) and see if we can provide an alternative strategy to guarantee correct behaviour.


- rajith


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10738/#review20483
-----------------------------------------------------------


On May 21, 2013, 2:48 p.m., rajith attapattu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10738/
> -----------------------------------------------------------
> 
> (Updated May 21, 2013, 2:48 p.m.)
> 
> 
> Review request for qpid, Robbie Gemmell, Weston Price, and Rob Godfrey.
> 
> 
> Description
> -------
> 
> There are at least 3 cases where the deadlock btw _messageDeliveryLock and _failoverMutex surfaces. Among them sending a message inside onMessage() and the session being closed due to an error (causing the deadlock) seems to come up a lot in production environments. There is also a deadlock btw _messageDeliveryLock and _lock (AMQSession.java) which happens less frequently.
> The messageDeliveryLock is used to ensure that we don't close the session in the middle of a message delivery. In order to do this we hold the lock across onMessage().
> This causes several issues in addition to the potential to deadlock. If an onMessage call takes longer/wedged then you cannot close the session or failover will not happen until it returns as the same thread is holding the failoverMutex. 
> 
> Based on an idea from Rafi, I have come up with a solution to get rid of _messageDeliveryLock and instead use an alternative strategy to achieve similar functionality.
> In order to ensure that close() doesn't proceed until the message deliveries currently in progress completes, an atomic counter is used to keep track of message deliveries in progress.
> The close() will wait until the count falls to zero before proceeding. No new deliveries will be initiated bcos the close method will mark the session as closed.
> The wait has a timeout to ensure that a longer running or wedged onMessage() will not hold up session close.
> There is a slim chance that before a session being marked as closed a message delivery could be initiated, but not yet gotten to the point of updating the counter, hence waitForMsgDeliveryToFinish() will see it as zero and proceed with close. But in comparison to the issues with _messageDeliveryLock, I believe it's acceptable.
> 
> There is an issue if MessageConsumer close is called outside of Session close. This can be solved in a similar manner. I will wait until the current review is complete and then post the solution for the MessageConsumer close.
> I will commit them both together.
> 
> 
> This addresses bug QPID-4574.
>     https://issues.apache.org/jira/browse/QPID-4574
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQConnection.java 1484812 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java 1484812 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java 1484812 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer_0_10.java 1484812 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/util/ConditionManager.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/10738/diff/
> 
> 
> Testing
> -------
> 
> Java test suite, tests from customers and QE around the deadlock situation.
> 
> 
> Thanks,
> 
> rajith attapattu
> 
>


Re: Review Request: Address deadlock btw _messageDeliveryLock and _failoverMutex

Posted by rajith attapattu <ra...@gmail.com>.

> On May 13, 2013, 3:22 p.m., Robbie Gemmell wrote:
> > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java, line 767
> > <https://reviews.apache.org/r/10738/diff/2/?file=288992#file288992line767>
> >
> >     We probably shouldn't catch all exceptions, or eat the interrupted status.
> 
> rajith attapattu wrote:
>     For starters I will narrow it down to interrupted exception and then log it.

I meant catch the interrupted exception only and ignore.
Even if the thread was woken up prematurely, it will go back to waiting if the variable isn't set. 


- rajith


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10738/#review20483
-----------------------------------------------------------


On May 8, 2013, 2:02 p.m., rajith attapattu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10738/
> -----------------------------------------------------------
> 
> (Updated May 8, 2013, 2:02 p.m.)
> 
> 
> Review request for qpid, Robbie Gemmell, Weston Price, and Rob Godfrey.
> 
> 
> Description
> -------
> 
> There are at least 3 cases where the deadlock btw _messageDeliveryLock and _failoverMutex surfaces. Among them sending a message inside onMessage() and the session being closed due to an error (causing the deadlock) seems to come up a lot in production environments. There is also a deadlock btw _messageDeliveryLock and _lock (AMQSession.java) which happens less frequently.
> The messageDeliveryLock is used to ensure that we don't close the session in the middle of a message delivery. In order to do this we hold the lock across onMessage().
> This causes several issues in addition to the potential to deadlock. If an onMessage call takes longer/wedged then you cannot close the session or failover will not happen until it returns as the same thread is holding the failoverMutex. 
> 
> Based on an idea from Rafi, I have come up with a solution to get rid of _messageDeliveryLock and instead use an alternative strategy to achieve similar functionality.
> In order to ensure that close() doesn't proceed until the message deliveries currently in progress completes, an atomic counter is used to keep track of message deliveries in progress.
> The close() will wait until the count falls to zero before proceeding. No new deliveries will be initiated bcos the close method will mark the session as closed.
> The wait has a timeout to ensure that a longer running or wedged onMessage() will not hold up session close.
> There is a slim chance that before a session being marked as closed a message delivery could be initiated, but not yet gotten to the point of updating the counter, hence waitForMsgDeliveryToFinish() will see it as zero and proceed with close. But in comparison to the issues with _messageDeliveryLock, I believe it's acceptable.
> 
> There is an issue if MessageConsumer close is called outside of Session close. This can be solved in a similar manner. I will wait until the current review is complete and then post the solution for the MessageConsumer close.
> I will commit them both together.
> 
> 
> This addresses bug QPID-4574.
>     https://issues.apache.org/jira/browse/QPID-4574
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQConnection.java 1480271 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java 1480271 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java 1480271 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/common/src/main/java/org/apache/qpid/configuration/ClientProperties.java 1480271 
> 
> Diff: https://reviews.apache.org/r/10738/diff/
> 
> 
> Testing
> -------
> 
> Java test suite, tests from customers and QE around the deadlock situation.
> 
> 
> Thanks,
> 
> rajith attapattu
> 
>


Re: Review Request: Address deadlock btw _messageDeliveryLock and _failoverMutex

Posted by rajith attapattu <ra...@gmail.com>.

> On May 13, 2013, 3:22 p.m., Robbie Gemmell wrote:
> > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java, lines 255-256
> > <https://reviews.apache.org/r/10738/diff/2/?file=288992#file288992line255>
> >
> >     Unused

Will mark for removal.


> On May 13, 2013, 3:22 p.m., Robbie Gemmell wrote:
> > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java, line 750
> > <https://reviews.apache.org/r/10738/diff/2/?file=288992#file288992line750>
> >
> >     Might be better called 'waitForInProgressDeliveryToComplete' as it doesn't actually wait for all message delivery to finish (although that might be the case, depending on other factors outside its control) but rather any that is already in progress to complete.

Make sense, I will make that change.


> On May 13, 2013, 3:22 p.m., Robbie Gemmell wrote:
> > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java, lines 758-766
> > <https://reviews.apache.org/r/10738/diff/2/?file=288992#file288992line758>
> >
> >     We should hold the object lock while performing the check, otherwise we could go into the wait() after the only notify() that could wake us has already been performed.

Good catch!


> On May 13, 2013, 3:22 p.m., Robbie Gemmell wrote:
> > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java, line 767
> > <https://reviews.apache.org/r/10738/diff/2/?file=288992#file288992line767>
> >
> >     We probably shouldn't catch all exceptions, or eat the interrupted status.

For starters I will narrow it down to interrupted exception and then log it.


> On May 13, 2013, 3:22 p.m., Robbie Gemmell wrote:
> > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java, lines 3339-3347
> > <https://reviews.apache.org/r/10738/diff/2/?file=288992#file288992line3339>
> >
> >     Linked to earlier comment, we need to hold the object lock while performing the update or we could potentially notify() before they call wait(), which might mean they never wake up.

Will fix this


> On May 13, 2013, 3:22 p.m., Robbie Gemmell wrote:
> > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java, line 137
> > <https://reviews.apache.org/r/10738/diff/2/?file=288993#file288993line137>
> >
> >     This and all the code that uses it seems to duplicate the code added in AMQSession. It seems like this stuff could go in a utility class to avoid the duplication, which would also permit easy addition of some unit tests.

Sure, will do that.


> On May 13, 2013, 3:22 p.m., Robbie Gemmell wrote:
> > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java, lines 771-785
> > <https://reviews.apache.org/r/10738/diff/2/?file=288993#file288993line771>
> >
> >     Same comments as earlier about thread safety, exception handling, and reuse.

Will be addressed in next rev.


> On May 13, 2013, 3:22 p.m., Robbie Gemmell wrote:
> > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java, lines 1059-1082
> > <https://reviews.apache.org/r/10738/diff/2/?file=288993#file288993line1059>
> >
> >     Same comments as earlier about method name, thread safety, exception handling, and reuse.

Will be addresed in next rev.


> On May 13, 2013, 3:22 p.m., Robbie Gemmell wrote:
> > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQConnection.java, line 873
> > <https://reviews.apache.org/r/10738/diff/2/?file=288991#file288991line873>
> >
> >     Whitespace

Will fix this. Can't see this in my git patch.
(The diff for RB is created from my svn checkout, as RB doesn't like the git formatted patch).
Will double check all whitespace issues before I commit.


> On May 13, 2013, 3:22 p.m., Robbie Gemmell wrote:
> > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java, lines 744-745
> > <https://reviews.apache.org/r/10738/diff/2/?file=288993#file288993line744>
> >
> >     What impact does dropping the message on the floor here have on the client?
> >     
> >     Earlier points in the call hierarchy seem to make effort to do other things with the message when detecting session/consumer close, so is there any impact from not doing so? E.g a message getting stuck acquired for a now-closed consumer?
> >     
> >     Does any particular attention need paid to the overridden 0-10 specific version of this method?

The message will be dropped if a consumer (or session is closed or closing).
When a consumer is closed, any messages acquired but not acknowledged should be be made available to another consumer by the broker.
These messages will be marked redelivered.
Is this the same situation for 0-8/0-9 ?

The situation is the same as a consumer closing with a bunch of unacked messages when in CLIENT_ACK mode.
Alternatively we could reject the message (release in 0-10 terms). But I don't think this is required, given that we will be closing the consumer anyways.


>Earlier points in the call hierarchy seem to make effort to do other things with the message when detecting session/consumer close
By "other things" are you referring to a reject? In 0-10 AFIK you don't need to do it. 
The situation is the same as a consumer closing with a bunch of unacked messages when in CLIENT_ACK mode.

>Does any particular attention need paid to the overridden 0-10 specific version of this method?
IMO adding "if (!(isClosed() || isClosing()))" is required to prevent the 0-10 specific method from issuing credit (and receiving more messages, that will eventually get released).
There is a chance where the consumer could be marked closed, but not yet reached the point of sending a cancel, by the time messageFlow() is called.
The above check should prevent it.


> On May 13, 2013, 3:22 p.m., Robbie Gemmell wrote:
> > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java, lines 594-595
> > <https://reviews.apache.org/r/10738/diff/2/?file=288993#file288993line594>
> >
> >     Removal of this usage of the old lock will cause a fair shift in client behaviour, allowing the consumer close to proceed at the same time as message delivery is ongoing on the session, possibly entailing things such as the Dispatcher performing a session rollback in a message listener while this close is in progress.
> >     
> >     How clear are we on what impact this change has on the client? For example, this lock usage was apparently added specifically to prevent a deadlock in that sort of situation. Has your investigation of this change in behaviour determined whether that would become a problem again?

Did you mean race condition (instead of deadlock)?
The Consumer will check if it's closed (or closing) before it tries to deliver the message to the application. Therefore even if the session initiates a delivery before the consumer is marked closed it will not proceed beyond the notifyConsumer method in BasicMessageConsumer. The dropped message will be released when the consumer is closed.

>Dispatcher performing a session rollback in a message listener while this close is in progress.
AFAIK the rollback method did not grab the _messageDeliveryLock, instead it was relying on "_lock".
Both close() and notifyMessage() hold this lock for the entire duration of the call, thereby keeping these operations mutually exclusive.

Perhaps I misunderstood your concern here? If so please explain again.


- rajith


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10738/#review20483
-----------------------------------------------------------


On May 8, 2013, 2:02 p.m., rajith attapattu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10738/
> -----------------------------------------------------------
> 
> (Updated May 8, 2013, 2:02 p.m.)
> 
> 
> Review request for qpid, Robbie Gemmell, Weston Price, and Rob Godfrey.
> 
> 
> Description
> -------
> 
> There are at least 3 cases where the deadlock btw _messageDeliveryLock and _failoverMutex surfaces. Among them sending a message inside onMessage() and the session being closed due to an error (causing the deadlock) seems to come up a lot in production environments. There is also a deadlock btw _messageDeliveryLock and _lock (AMQSession.java) which happens less frequently.
> The messageDeliveryLock is used to ensure that we don't close the session in the middle of a message delivery. In order to do this we hold the lock across onMessage().
> This causes several issues in addition to the potential to deadlock. If an onMessage call takes longer/wedged then you cannot close the session or failover will not happen until it returns as the same thread is holding the failoverMutex. 
> 
> Based on an idea from Rafi, I have come up with a solution to get rid of _messageDeliveryLock and instead use an alternative strategy to achieve similar functionality.
> In order to ensure that close() doesn't proceed until the message deliveries currently in progress completes, an atomic counter is used to keep track of message deliveries in progress.
> The close() will wait until the count falls to zero before proceeding. No new deliveries will be initiated bcos the close method will mark the session as closed.
> The wait has a timeout to ensure that a longer running or wedged onMessage() will not hold up session close.
> There is a slim chance that before a session being marked as closed a message delivery could be initiated, but not yet gotten to the point of updating the counter, hence waitForMsgDeliveryToFinish() will see it as zero and proceed with close. But in comparison to the issues with _messageDeliveryLock, I believe it's acceptable.
> 
> There is an issue if MessageConsumer close is called outside of Session close. This can be solved in a similar manner. I will wait until the current review is complete and then post the solution for the MessageConsumer close.
> I will commit them both together.
> 
> 
> This addresses bug QPID-4574.
>     https://issues.apache.org/jira/browse/QPID-4574
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQConnection.java 1480271 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java 1480271 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java 1480271 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/common/src/main/java/org/apache/qpid/configuration/ClientProperties.java 1480271 
> 
> Diff: https://reviews.apache.org/r/10738/diff/
> 
> 
> Testing
> -------
> 
> Java test suite, tests from customers and QE around the deadlock situation.
> 
> 
> Thanks,
> 
> rajith attapattu
> 
>


Re: Review Request: Address deadlock btw _messageDeliveryLock and _failoverMutex

Posted by rajith attapattu <ra...@gmail.com>.

> On May 13, 2013, 3:22 p.m., Robbie Gemmell wrote:
> > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java, lines 744-745
> > <https://reviews.apache.org/r/10738/diff/2/?file=288993#file288993line744>
> >
> >     What impact does dropping the message on the floor here have on the client?
> >     
> >     Earlier points in the call hierarchy seem to make effort to do other things with the message when detecting session/consumer close, so is there any impact from not doing so? E.g a message getting stuck acquired for a now-closed consumer?
> >     
> >     Does any particular attention need paid to the overridden 0-10 specific version of this method?
> 
> rajith attapattu wrote:
>     The message will be dropped if a consumer (or session is closed or closing).
>     When a consumer is closed, any messages acquired but not acknowledged should be be made available to another consumer by the broker.
>     These messages will be marked redelivered.
>     Is this the same situation for 0-8/0-9 ?
>     
>     The situation is the same as a consumer closing with a bunch of unacked messages when in CLIENT_ACK mode.
>     Alternatively we could reject the message (release in 0-10 terms). But I don't think this is required, given that we will be closing the consumer anyways.
>     
>     
>     >Earlier points in the call hierarchy seem to make effort to do other things with the message when detecting session/consumer close
>     By "other things" are you referring to a reject? In 0-10 AFIK you don't need to do it. 
>     The situation is the same as a consumer closing with a bunch of unacked messages when in CLIENT_ACK mode.
>     
>     >Does any particular attention need paid to the overridden 0-10 specific version of this method?
>     IMO adding "if (!(isClosed() || isClosing()))" is required to prevent the 0-10 specific method from issuing credit (and receiving more messages, that will eventually get released).
>     There is a chance where the consumer could be marked closed, but not yet reached the point of sending a cancel, by the time messageFlow() is called.
>     The above check should prevent it.
> 
> Robbie Gemmell wrote:
>     "Did you mean race condition (instead of deadlock)?"
>     
>     I really meant deadlock. One of the uses of _messageDeliveryLock being removed was added as part of the fix for a deadlock (https://issues.apache.org/jira/browse/QPID-3911) and so I wonder what effect removing it again will have. It may be the other changes mean it isn't a problem, but I think it needs to be closely considered. 
>     
>     "The message will be dropped if a consumer (or session is closed or closing).
>     When a consumer is closed, any messages acquired but not acknowledged should be be made available to another consumer by the broker.
>     These messages will be marked redelivered.
>     Is this the same situation for 0-8/0-9 ?"
>     
>     That isn't actually the case, transferred messages are the responsibility of the session and remain acquired after consumer close until such point as the session explicitly does something with them. I believe this is true of 0-8/9/91 as well, and probably explains the origin of the reject/release code I referred to earlier in the call tree than the changes would now allow the message to be silently dropped.
>     
>     From the 0-10 spec: "Canceling a subscription MUST NOT affect pending transfers. A transfer made prior to canceling transfers to the destination MUST be able to be accepted, released, acquired, or rejected after the subscription is canceled."
> 
> rajith attapattu wrote:
>     Indeed, Gordon mentioned that to me and the 2nd patch (the one before I did today) takes care of rejecting messages from the consumer. We don't need to do the same when the session is closing, as when the session ends, the any unacked messages are put back on the queue.
> 
> rajith attapattu wrote:
>     As for QPID-3911,
>     There is a deadlock, albeit a bit different (involving the same locks) from QPID-3911, that do happen in similar circumstances.
>     However this deadlock appears to manifest with or without this patch, which leads me to believe that _messageDeliveryLock is not the right solution for QPID-3911.
>     Sadly the solution for QPID-3911 made it worse as there are at least two distinct cases of deadlocks involving _messageDeliveryLock.
>     1. Btw _lock and _messageDeliveryLock
>     2. Btw _messageDeliveryLock and _failoverMutex.
>     
>     We definitely need to find a solution for the deadlocks (at least 3 cases) btw failoverMutex and _lock (in AMQSession), which seems to be the root of all evil :)
>     We might have to drop one lock (most likely _lock) and see if we can provide an alternative strategy to guarantee correct behaviour.
>

Sorry I meant dopping _failoverMutex (not _lock in AMQSession).
It might also be an opportunity to fix our less than stellar failover.


- rajith


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10738/#review20483
-----------------------------------------------------------


On May 21, 2013, 2:48 p.m., rajith attapattu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10738/
> -----------------------------------------------------------
> 
> (Updated May 21, 2013, 2:48 p.m.)
> 
> 
> Review request for qpid, Robbie Gemmell, Weston Price, and Rob Godfrey.
> 
> 
> Description
> -------
> 
> There are at least 3 cases where the deadlock btw _messageDeliveryLock and _failoverMutex surfaces. Among them sending a message inside onMessage() and the session being closed due to an error (causing the deadlock) seems to come up a lot in production environments. There is also a deadlock btw _messageDeliveryLock and _lock (AMQSession.java) which happens less frequently.
> The messageDeliveryLock is used to ensure that we don't close the session in the middle of a message delivery. In order to do this we hold the lock across onMessage().
> This causes several issues in addition to the potential to deadlock. If an onMessage call takes longer/wedged then you cannot close the session or failover will not happen until it returns as the same thread is holding the failoverMutex. 
> 
> Based on an idea from Rafi, I have come up with a solution to get rid of _messageDeliveryLock and instead use an alternative strategy to achieve similar functionality.
> In order to ensure that close() doesn't proceed until the message deliveries currently in progress completes, an atomic counter is used to keep track of message deliveries in progress.
> The close() will wait until the count falls to zero before proceeding. No new deliveries will be initiated bcos the close method will mark the session as closed.
> The wait has a timeout to ensure that a longer running or wedged onMessage() will not hold up session close.
> There is a slim chance that before a session being marked as closed a message delivery could be initiated, but not yet gotten to the point of updating the counter, hence waitForMsgDeliveryToFinish() will see it as zero and proceed with close. But in comparison to the issues with _messageDeliveryLock, I believe it's acceptable.
> 
> There is an issue if MessageConsumer close is called outside of Session close. This can be solved in a similar manner. I will wait until the current review is complete and then post the solution for the MessageConsumer close.
> I will commit them both together.
> 
> 
> This addresses bug QPID-4574.
>     https://issues.apache.org/jira/browse/QPID-4574
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQConnection.java 1484812 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java 1484812 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java 1484812 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer_0_10.java 1484812 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/util/ConditionManager.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/10738/diff/
> 
> 
> Testing
> -------
> 
> Java test suite, tests from customers and QE around the deadlock situation.
> 
> 
> Thanks,
> 
> rajith attapattu
> 
>


Re: Review Request: Address deadlock btw _messageDeliveryLock and _failoverMutex

Posted by rajith attapattu <ra...@gmail.com>.

> On May 13, 2013, 3:22 p.m., Robbie Gemmell wrote:
> > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java, lines 744-745
> > <https://reviews.apache.org/r/10738/diff/2/?file=288993#file288993line744>
> >
> >     What impact does dropping the message on the floor here have on the client?
> >     
> >     Earlier points in the call hierarchy seem to make effort to do other things with the message when detecting session/consumer close, so is there any impact from not doing so? E.g a message getting stuck acquired for a now-closed consumer?
> >     
> >     Does any particular attention need paid to the overridden 0-10 specific version of this method?
> 
> rajith attapattu wrote:
>     The message will be dropped if a consumer (or session is closed or closing).
>     When a consumer is closed, any messages acquired but not acknowledged should be be made available to another consumer by the broker.
>     These messages will be marked redelivered.
>     Is this the same situation for 0-8/0-9 ?
>     
>     The situation is the same as a consumer closing with a bunch of unacked messages when in CLIENT_ACK mode.
>     Alternatively we could reject the message (release in 0-10 terms). But I don't think this is required, given that we will be closing the consumer anyways.
>     
>     
>     >Earlier points in the call hierarchy seem to make effort to do other things with the message when detecting session/consumer close
>     By "other things" are you referring to a reject? In 0-10 AFIK you don't need to do it. 
>     The situation is the same as a consumer closing with a bunch of unacked messages when in CLIENT_ACK mode.
>     
>     >Does any particular attention need paid to the overridden 0-10 specific version of this method?
>     IMO adding "if (!(isClosed() || isClosing()))" is required to prevent the 0-10 specific method from issuing credit (and receiving more messages, that will eventually get released).
>     There is a chance where the consumer could be marked closed, but not yet reached the point of sending a cancel, by the time messageFlow() is called.
>     The above check should prevent it.
> 
> Robbie Gemmell wrote:
>     "Did you mean race condition (instead of deadlock)?"
>     
>     I really meant deadlock. One of the uses of _messageDeliveryLock being removed was added as part of the fix for a deadlock (https://issues.apache.org/jira/browse/QPID-3911) and so I wonder what effect removing it again will have. It may be the other changes mean it isn't a problem, but I think it needs to be closely considered. 
>     
>     "The message will be dropped if a consumer (or session is closed or closing).
>     When a consumer is closed, any messages acquired but not acknowledged should be be made available to another consumer by the broker.
>     These messages will be marked redelivered.
>     Is this the same situation for 0-8/0-9 ?"
>     
>     That isn't actually the case, transferred messages are the responsibility of the session and remain acquired after consumer close until such point as the session explicitly does something with them. I believe this is true of 0-8/9/91 as well, and probably explains the origin of the reject/release code I referred to earlier in the call tree than the changes would now allow the message to be silently dropped.
>     
>     From the 0-10 spec: "Canceling a subscription MUST NOT affect pending transfers. A transfer made prior to canceling transfers to the destination MUST be able to be accepted, released, acquired, or rejected after the subscription is canceled."
> 
> rajith attapattu wrote:
>     Indeed, Gordon mentioned that to me and the 2nd patch (the one before I did today) takes care of rejecting messages from the consumer. We don't need to do the same when the session is closing, as when the session ends, the any unacked messages are put back on the queue.
> 
> rajith attapattu wrote:
>     As for QPID-3911,
>     There is a deadlock, albeit a bit different (involving the same locks) from QPID-3911, that do happen in similar circumstances.
>     However this deadlock appears to manifest with or without this patch, which leads me to believe that _messageDeliveryLock is not the right solution for QPID-3911.
>     Sadly the solution for QPID-3911 made it worse as there are at least two distinct cases of deadlocks involving _messageDeliveryLock.
>     1. Btw _lock and _messageDeliveryLock
>     2. Btw _messageDeliveryLock and _failoverMutex.
>     
>     We definitely need to find a solution for the deadlocks (at least 3 cases) btw failoverMutex and _lock (in AMQSession), which seems to be the root of all evil :)
>     We might have to drop one lock (most likely _lock) and see if we can provide an alternative strategy to guarantee correct behaviour.
>
> 
> rajith attapattu wrote:
>     Sorry I meant dopping _failoverMutex (not _lock in AMQSession).
>     It might also be an opportunity to fix our less than stellar failover.
> 
> Robbie Gemmell wrote:
>     The deadlock itself wasn't really the main issue in QPID-3911, it was just a very obvious symptom that presented after the underlying problem had occurred, which was that the client sent illegal commands to the broker due to allowing things to occur at the same time (/in the wrong order) that it shouldn't had, namely suspending the session(/consumers) during session rollback inside onMessage() at the same time as a consumer close occurring on the session in another thread. The deliveryLock presented a route to stop that by preventing the possibility that the dispatcher would be inside user code at the time and thus make it unable to cause the problem.
>     
>     As you say, while the changes fixed that problem it seems they introduced another. It so far appears to me like that is what using the current changes here would be doing, by returning the client to the state where it could be delivering messages on the session [for another consumer] that could allow rollback while a consumer close is in progress, but with the distinction that we would know in advance this time that there could be a problem with the change.
>     
>     Having a quick look through the client code, it seems like an avenue you could pursue to fix the issue (for 0-10 at least) might be the AMQSession suspensionLock. It is only used inside AMQSession currently, in a way that looks like its acquisition wont be followed by executing user code or acquiring any more locks, and so at first glance looks like it could be safe to acquire during the consumer close as well in order to prevent the concurrent suspend+close that actually caused the problem for 0-10 in QPID-3911. That would only be part of the fix though as it would probably allow a consumer close to block the session rollback only for that to then proceed afterwards and cause the same illegal commands to be sent. To prevent this it seems like ensuring we don't operate on closing/closed consumers in AMQSession_0_10#sendSuspendChannel(..) would stop the client issuing the stop/flow commands for the now-closing/closed consumers. I haven't spent enough time looking at this to know if there are any side effects from doing that though, or if that would fix the related issues seen with 0-8/9/91 which were slightly different and didn't actually involve a deadlock but rather a command timeout, so such things would need investigated if you pursued the approach.

What you describe could happen and obtaining the suspension lock could be a potential solution.
The same happens when the broker closes the session while rollback is in progress.
In both situations a deadlock happens.

Another potential solution is (which I think we should do anyways) is to check if the session is marked closed before we send anything to the broker.
As soon as we enter session close() we should mark the session as being closed.
There are a lot of methods where we just send commands on the wire without checking this and it is causing issues.

I will investigate both.


- rajith


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10738/#review20483
-----------------------------------------------------------


On May 21, 2013, 2:48 p.m., rajith attapattu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10738/
> -----------------------------------------------------------
> 
> (Updated May 21, 2013, 2:48 p.m.)
> 
> 
> Review request for qpid, Robbie Gemmell, Weston Price, and Rob Godfrey.
> 
> 
> Description
> -------
> 
> There are at least 3 cases where the deadlock btw _messageDeliveryLock and _failoverMutex surfaces. Among them sending a message inside onMessage() and the session being closed due to an error (causing the deadlock) seems to come up a lot in production environments. There is also a deadlock btw _messageDeliveryLock and _lock (AMQSession.java) which happens less frequently.
> The messageDeliveryLock is used to ensure that we don't close the session in the middle of a message delivery. In order to do this we hold the lock across onMessage().
> This causes several issues in addition to the potential to deadlock. If an onMessage call takes longer/wedged then you cannot close the session or failover will not happen until it returns as the same thread is holding the failoverMutex. 
> 
> Based on an idea from Rafi, I have come up with a solution to get rid of _messageDeliveryLock and instead use an alternative strategy to achieve similar functionality.
> In order to ensure that close() doesn't proceed until the message deliveries currently in progress completes, an atomic counter is used to keep track of message deliveries in progress.
> The close() will wait until the count falls to zero before proceeding. No new deliveries will be initiated bcos the close method will mark the session as closed.
> The wait has a timeout to ensure that a longer running or wedged onMessage() will not hold up session close.
> There is a slim chance that before a session being marked as closed a message delivery could be initiated, but not yet gotten to the point of updating the counter, hence waitForMsgDeliveryToFinish() will see it as zero and proceed with close. But in comparison to the issues with _messageDeliveryLock, I believe it's acceptable.
> 
> There is an issue if MessageConsumer close is called outside of Session close. This can be solved in a similar manner. I will wait until the current review is complete and then post the solution for the MessageConsumer close.
> I will commit them both together.
> 
> 
> This addresses bug QPID-4574.
>     https://issues.apache.org/jira/browse/QPID-4574
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQConnection.java 1484812 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java 1484812 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java 1484812 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer_0_10.java 1484812 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/util/ConditionManager.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/10738/diff/
> 
> 
> Testing
> -------
> 
> Java test suite, tests from customers and QE around the deadlock situation.
> 
> 
> Thanks,
> 
> rajith attapattu
> 
>


Re: Review Request: Address deadlock btw _messageDeliveryLock and _failoverMutex

Posted by rajith attapattu <ra...@gmail.com>.

> On May 13, 2013, 3:22 p.m., Robbie Gemmell wrote:
> > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java, lines 744-745
> > <https://reviews.apache.org/r/10738/diff/2/?file=288993#file288993line744>
> >
> >     What impact does dropping the message on the floor here have on the client?
> >     
> >     Earlier points in the call hierarchy seem to make effort to do other things with the message when detecting session/consumer close, so is there any impact from not doing so? E.g a message getting stuck acquired for a now-closed consumer?
> >     
> >     Does any particular attention need paid to the overridden 0-10 specific version of this method?
> 
> rajith attapattu wrote:
>     The message will be dropped if a consumer (or session is closed or closing).
>     When a consumer is closed, any messages acquired but not acknowledged should be be made available to another consumer by the broker.
>     These messages will be marked redelivered.
>     Is this the same situation for 0-8/0-9 ?
>     
>     The situation is the same as a consumer closing with a bunch of unacked messages when in CLIENT_ACK mode.
>     Alternatively we could reject the message (release in 0-10 terms). But I don't think this is required, given that we will be closing the consumer anyways.
>     
>     
>     >Earlier points in the call hierarchy seem to make effort to do other things with the message when detecting session/consumer close
>     By "other things" are you referring to a reject? In 0-10 AFIK you don't need to do it. 
>     The situation is the same as a consumer closing with a bunch of unacked messages when in CLIENT_ACK mode.
>     
>     >Does any particular attention need paid to the overridden 0-10 specific version of this method?
>     IMO adding "if (!(isClosed() || isClosing()))" is required to prevent the 0-10 specific method from issuing credit (and receiving more messages, that will eventually get released).
>     There is a chance where the consumer could be marked closed, but not yet reached the point of sending a cancel, by the time messageFlow() is called.
>     The above check should prevent it.
> 
> Robbie Gemmell wrote:
>     "Did you mean race condition (instead of deadlock)?"
>     
>     I really meant deadlock. One of the uses of _messageDeliveryLock being removed was added as part of the fix for a deadlock (https://issues.apache.org/jira/browse/QPID-3911) and so I wonder what effect removing it again will have. It may be the other changes mean it isn't a problem, but I think it needs to be closely considered. 
>     
>     "The message will be dropped if a consumer (or session is closed or closing).
>     When a consumer is closed, any messages acquired but not acknowledged should be be made available to another consumer by the broker.
>     These messages will be marked redelivered.
>     Is this the same situation for 0-8/0-9 ?"
>     
>     That isn't actually the case, transferred messages are the responsibility of the session and remain acquired after consumer close until such point as the session explicitly does something with them. I believe this is true of 0-8/9/91 as well, and probably explains the origin of the reject/release code I referred to earlier in the call tree than the changes would now allow the message to be silently dropped.
>     
>     From the 0-10 spec: "Canceling a subscription MUST NOT affect pending transfers. A transfer made prior to canceling transfers to the destination MUST be able to be accepted, released, acquired, or rejected after the subscription is canceled."

Indeed, Gordon mentioned that to me and the 2nd patch (the one before I did today) takes care of rejecting messages from the consumer. We don't need to do the same when the session is closing, as when the session ends, the any unacked messages are put back on the queue.


- rajith


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10738/#review20483
-----------------------------------------------------------


On May 21, 2013, 2:48 p.m., rajith attapattu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10738/
> -----------------------------------------------------------
> 
> (Updated May 21, 2013, 2:48 p.m.)
> 
> 
> Review request for qpid, Robbie Gemmell, Weston Price, and Rob Godfrey.
> 
> 
> Description
> -------
> 
> There are at least 3 cases where the deadlock btw _messageDeliveryLock and _failoverMutex surfaces. Among them sending a message inside onMessage() and the session being closed due to an error (causing the deadlock) seems to come up a lot in production environments. There is also a deadlock btw _messageDeliveryLock and _lock (AMQSession.java) which happens less frequently.
> The messageDeliveryLock is used to ensure that we don't close the session in the middle of a message delivery. In order to do this we hold the lock across onMessage().
> This causes several issues in addition to the potential to deadlock. If an onMessage call takes longer/wedged then you cannot close the session or failover will not happen until it returns as the same thread is holding the failoverMutex. 
> 
> Based on an idea from Rafi, I have come up with a solution to get rid of _messageDeliveryLock and instead use an alternative strategy to achieve similar functionality.
> In order to ensure that close() doesn't proceed until the message deliveries currently in progress completes, an atomic counter is used to keep track of message deliveries in progress.
> The close() will wait until the count falls to zero before proceeding. No new deliveries will be initiated bcos the close method will mark the session as closed.
> The wait has a timeout to ensure that a longer running or wedged onMessage() will not hold up session close.
> There is a slim chance that before a session being marked as closed a message delivery could be initiated, but not yet gotten to the point of updating the counter, hence waitForMsgDeliveryToFinish() will see it as zero and proceed with close. But in comparison to the issues with _messageDeliveryLock, I believe it's acceptable.
> 
> There is an issue if MessageConsumer close is called outside of Session close. This can be solved in a similar manner. I will wait until the current review is complete and then post the solution for the MessageConsumer close.
> I will commit them both together.
> 
> 
> This addresses bug QPID-4574.
>     https://issues.apache.org/jira/browse/QPID-4574
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQConnection.java 1484812 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java 1484812 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java 1484812 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer_0_10.java 1484812 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/util/ConditionManager.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/10738/diff/
> 
> 
> Testing
> -------
> 
> Java test suite, tests from customers and QE around the deadlock situation.
> 
> 
> Thanks,
> 
> rajith attapattu
> 
>


Re: Review Request: Address deadlock btw _messageDeliveryLock and _failoverMutex

Posted by Robbie Gemmell <ro...@gmail.com>.

> On May 13, 2013, 3:22 p.m., Robbie Gemmell wrote:
> > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQConnection.java, line 873
> > <https://reviews.apache.org/r/10738/diff/2/?file=288991#file288991line873>
> >
> >     Whitespace
> 
> rajith attapattu wrote:
>     Will fix this. Can't see this in my git patch.
>     (The diff for RB is created from my svn checkout, as RB doesn't like the git formatted patch).
>     Will double check all whitespace issues before I commit.

I'd recommend turning on the git diff/log/etc ui colouring if you haven't already, it shows whitespace and tabs in nice big read chunks before you ever get near committing your changes. Then, if your original patch doesn't have them, the SVN diff produced from them shouldn't either.


> On May 13, 2013, 3:22 p.m., Robbie Gemmell wrote:
> > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java, lines 744-745
> > <https://reviews.apache.org/r/10738/diff/2/?file=288993#file288993line744>
> >
> >     What impact does dropping the message on the floor here have on the client?
> >     
> >     Earlier points in the call hierarchy seem to make effort to do other things with the message when detecting session/consumer close, so is there any impact from not doing so? E.g a message getting stuck acquired for a now-closed consumer?
> >     
> >     Does any particular attention need paid to the overridden 0-10 specific version of this method?
> 
> rajith attapattu wrote:
>     The message will be dropped if a consumer (or session is closed or closing).
>     When a consumer is closed, any messages acquired but not acknowledged should be be made available to another consumer by the broker.
>     These messages will be marked redelivered.
>     Is this the same situation for 0-8/0-9 ?
>     
>     The situation is the same as a consumer closing with a bunch of unacked messages when in CLIENT_ACK mode.
>     Alternatively we could reject the message (release in 0-10 terms). But I don't think this is required, given that we will be closing the consumer anyways.
>     
>     
>     >Earlier points in the call hierarchy seem to make effort to do other things with the message when detecting session/consumer close
>     By "other things" are you referring to a reject? In 0-10 AFIK you don't need to do it. 
>     The situation is the same as a consumer closing with a bunch of unacked messages when in CLIENT_ACK mode.
>     
>     >Does any particular attention need paid to the overridden 0-10 specific version of this method?
>     IMO adding "if (!(isClosed() || isClosing()))" is required to prevent the 0-10 specific method from issuing credit (and receiving more messages, that will eventually get released).
>     There is a chance where the consumer could be marked closed, but not yet reached the point of sending a cancel, by the time messageFlow() is called.
>     The above check should prevent it.

"Did you mean race condition (instead of deadlock)?"

I really meant deadlock. One of the uses of _messageDeliveryLock being removed was added as part of the fix for a deadlock (https://issues.apache.org/jira/browse/QPID-3911) and so I wonder what effect removing it again will have. It may be the other changes mean it isn't a problem, but I think it needs to be closely considered. 

"The message will be dropped if a consumer (or session is closed or closing).
When a consumer is closed, any messages acquired but not acknowledged should be be made available to another consumer by the broker.
These messages will be marked redelivered.
Is this the same situation for 0-8/0-9 ?"

That isn't actually the case, transferred messages are the responsibility of the session and remain acquired after consumer close until such point as the session explicitly does something with them. I believe this is true of 0-8/9/91 as well, and probably explains the origin of the reject/release code I referred to earlier in the call tree than the changes would now allow the message to be silently dropped.

>From the 0-10 spec: "Canceling a subscription MUST NOT affect pending transfers. A transfer made prior to canceling transfers to the destination MUST be able to be accepted, released, acquired, or rejected after the subscription is canceled."


- Robbie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10738/#review20483
-----------------------------------------------------------


On May 15, 2013, 3:06 a.m., rajith attapattu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10738/
> -----------------------------------------------------------
> 
> (Updated May 15, 2013, 3:06 a.m.)
> 
> 
> Review request for qpid, Robbie Gemmell, Weston Price, and Rob Godfrey.
> 
> 
> Description
> -------
> 
> There are at least 3 cases where the deadlock btw _messageDeliveryLock and _failoverMutex surfaces. Among them sending a message inside onMessage() and the session being closed due to an error (causing the deadlock) seems to come up a lot in production environments. There is also a deadlock btw _messageDeliveryLock and _lock (AMQSession.java) which happens less frequently.
> The messageDeliveryLock is used to ensure that we don't close the session in the middle of a message delivery. In order to do this we hold the lock across onMessage().
> This causes several issues in addition to the potential to deadlock. If an onMessage call takes longer/wedged then you cannot close the session or failover will not happen until it returns as the same thread is holding the failoverMutex. 
> 
> Based on an idea from Rafi, I have come up with a solution to get rid of _messageDeliveryLock and instead use an alternative strategy to achieve similar functionality.
> In order to ensure that close() doesn't proceed until the message deliveries currently in progress completes, an atomic counter is used to keep track of message deliveries in progress.
> The close() will wait until the count falls to zero before proceeding. No new deliveries will be initiated bcos the close method will mark the session as closed.
> The wait has a timeout to ensure that a longer running or wedged onMessage() will not hold up session close.
> There is a slim chance that before a session being marked as closed a message delivery could be initiated, but not yet gotten to the point of updating the counter, hence waitForMsgDeliveryToFinish() will see it as zero and proceed with close. But in comparison to the issues with _messageDeliveryLock, I believe it's acceptable.
> 
> There is an issue if MessageConsumer close is called outside of Session close. This can be solved in a similar manner. I will wait until the current review is complete and then post the solution for the MessageConsumer close.
> I will commit them both together.
> 
> 
> This addresses bug QPID-4574.
>     https://issues.apache.org/jira/browse/QPID-4574
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQConnection.java 1480271 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java 1480271 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java 1480271 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer_0_10.java 1480271 
> 
> Diff: https://reviews.apache.org/r/10738/diff/
> 
> 
> Testing
> -------
> 
> Java test suite, tests from customers and QE around the deadlock situation.
> 
> 
> Thanks,
> 
> rajith attapattu
> 
>


Re: Review Request: Address deadlock btw _messageDeliveryLock and _failoverMutex

Posted by Robbie Gemmell <ro...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10738/#review20483
-----------------------------------------------------------



http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQConnection.java
<https://reviews.apache.org/r/10738/#comment42204>

    Whitespace



http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQConnection.java
<https://reviews.apache.org/r/10738/#comment42205>

    Whitespace



http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java
<https://reviews.apache.org/r/10738/#comment42206>

    Unused



http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java
<https://reviews.apache.org/r/10738/#comment42207>

    Might be better called 'waitForInProgressDeliveryToComplete' as it doesn't actually wait for all message delivery to finish (although that might be the case, depending on other factors outside its control) but rather any that is already in progress to complete.



http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java
<https://reviews.apache.org/r/10738/#comment42210>

    We should hold the object lock while performing the check, otherwise we could go into the wait() after the only notify() that could wake us has already been performed.



http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java
<https://reviews.apache.org/r/10738/#comment42211>

    We probably shouldn't catch all exceptions, or eat the interrupted status.



http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java
<https://reviews.apache.org/r/10738/#comment42212>

    Linked to earlier comment, we need to hold the object lock while performing the update or we could potentially notify() before they call wait(), which might mean they never wake up.



http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java
<https://reviews.apache.org/r/10738/#comment42213>

    We probably shouldn't catch all exceptions, or eat the interrupted status.



http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java
<https://reviews.apache.org/r/10738/#comment42216>

    This and all the code that uses it seems to duplicate the code added in AMQSession. It seems like this stuff could go in a utility class to avoid the duplication, which would also permit easy addition of some unit tests.



http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java
<https://reviews.apache.org/r/10738/#comment42217>

    Removal of this usage of the old lock will cause a fair shift in client behaviour, allowing the consumer close to proceed at the same time as message delivery is ongoing on the session, possibly entailing things such as the Dispatcher performing a session rollback in a message listener while this close is in progress.
    
    How clear are we on what impact this change has on the client? For example, this lock usage was apparently added specifically to prevent a deadlock in that sort of situation. Has your investigation of this change in behaviour determined whether that would become a problem again?



http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java
<https://reviews.apache.org/r/10738/#comment42218>

    What impact does dropping the message on the floor here have on the client?
    
    Earlier points in the call hierarchy seem to make effort to do other things with the message when detecting session/consumer close, so is there any impact from not doing so? E.g a message getting stuck acquired for a now-closed consumer?
    
    Does any particular attention need paid to the overridden 0-10 specific version of this method?



http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java
<https://reviews.apache.org/r/10738/#comment42219>

    Same comments as earlier about thread safety, exception handling, and reuse.



http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java
<https://reviews.apache.org/r/10738/#comment42220>

    Same comments as earlier about method name, thread safety, exception handling, and reuse.



http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/common/src/main/java/org/apache/qpid/configuration/ClientProperties.java
<https://reviews.apache.org/r/10738/#comment42214>

    Effectively unused


- Robbie Gemmell


On May 8, 2013, 2:02 p.m., rajith attapattu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10738/
> -----------------------------------------------------------
> 
> (Updated May 8, 2013, 2:02 p.m.)
> 
> 
> Review request for qpid, Robbie Gemmell, Weston Price, and Rob Godfrey.
> 
> 
> Description
> -------
> 
> There are at least 3 cases where the deadlock btw _messageDeliveryLock and _failoverMutex surfaces. Among them sending a message inside onMessage() and the session being closed due to an error (causing the deadlock) seems to come up a lot in production environments. There is also a deadlock btw _messageDeliveryLock and _lock (AMQSession.java) which happens less frequently.
> The messageDeliveryLock is used to ensure that we don't close the session in the middle of a message delivery. In order to do this we hold the lock across onMessage().
> This causes several issues in addition to the potential to deadlock. If an onMessage call takes longer/wedged then you cannot close the session or failover will not happen until it returns as the same thread is holding the failoverMutex. 
> 
> Based on an idea from Rafi, I have come up with a solution to get rid of _messageDeliveryLock and instead use an alternative strategy to achieve similar functionality.
> In order to ensure that close() doesn't proceed until the message deliveries currently in progress completes, an atomic counter is used to keep track of message deliveries in progress.
> The close() will wait until the count falls to zero before proceeding. No new deliveries will be initiated bcos the close method will mark the session as closed.
> The wait has a timeout to ensure that a longer running or wedged onMessage() will not hold up session close.
> There is a slim chance that before a session being marked as closed a message delivery could be initiated, but not yet gotten to the point of updating the counter, hence waitForMsgDeliveryToFinish() will see it as zero and proceed with close. But in comparison to the issues with _messageDeliveryLock, I believe it's acceptable.
> 
> There is an issue if MessageConsumer close is called outside of Session close. This can be solved in a similar manner. I will wait until the current review is complete and then post the solution for the MessageConsumer close.
> I will commit them both together.
> 
> 
> This addresses bug QPID-4574.
>     https://issues.apache.org/jira/browse/QPID-4574
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQConnection.java 1480271 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java 1480271 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java 1480271 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/common/src/main/java/org/apache/qpid/configuration/ClientProperties.java 1480271 
> 
> Diff: https://reviews.apache.org/r/10738/diff/
> 
> 
> Testing
> -------
> 
> Java test suite, tests from customers and QE around the deadlock situation.
> 
> 
> Thanks,
> 
> rajith attapattu
> 
>


Re: Review Request: Address deadlock btw _messageDeliveryLock and _failoverMutex

Posted by Robbie Gemmell <ro...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10738/#review20770
-----------------------------------------------------------



http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java
<https://reviews.apache.org/r/10738/#comment42863>

    New class missing from diff?


- Robbie Gemmell


On May 15, 2013, 3:06 a.m., rajith attapattu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10738/
> -----------------------------------------------------------
> 
> (Updated May 15, 2013, 3:06 a.m.)
> 
> 
> Review request for qpid, Robbie Gemmell, Weston Price, and Rob Godfrey.
> 
> 
> Description
> -------
> 
> There are at least 3 cases where the deadlock btw _messageDeliveryLock and _failoverMutex surfaces. Among them sending a message inside onMessage() and the session being closed due to an error (causing the deadlock) seems to come up a lot in production environments. There is also a deadlock btw _messageDeliveryLock and _lock (AMQSession.java) which happens less frequently.
> The messageDeliveryLock is used to ensure that we don't close the session in the middle of a message delivery. In order to do this we hold the lock across onMessage().
> This causes several issues in addition to the potential to deadlock. If an onMessage call takes longer/wedged then you cannot close the session or failover will not happen until it returns as the same thread is holding the failoverMutex. 
> 
> Based on an idea from Rafi, I have come up with a solution to get rid of _messageDeliveryLock and instead use an alternative strategy to achieve similar functionality.
> In order to ensure that close() doesn't proceed until the message deliveries currently in progress completes, an atomic counter is used to keep track of message deliveries in progress.
> The close() will wait until the count falls to zero before proceeding. No new deliveries will be initiated bcos the close method will mark the session as closed.
> The wait has a timeout to ensure that a longer running or wedged onMessage() will not hold up session close.
> There is a slim chance that before a session being marked as closed a message delivery could be initiated, but not yet gotten to the point of updating the counter, hence waitForMsgDeliveryToFinish() will see it as zero and proceed with close. But in comparison to the issues with _messageDeliveryLock, I believe it's acceptable.
> 
> There is an issue if MessageConsumer close is called outside of Session close. This can be solved in a similar manner. I will wait until the current review is complete and then post the solution for the MessageConsumer close.
> I will commit them both together.
> 
> 
> This addresses bug QPID-4574.
>     https://issues.apache.org/jira/browse/QPID-4574
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQConnection.java 1480271 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java 1480271 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java 1480271 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer_0_10.java 1480271 
> 
> Diff: https://reviews.apache.org/r/10738/diff/
> 
> 
> Testing
> -------
> 
> Java test suite, tests from customers and QE around the deadlock situation.
> 
> 
> Thanks,
> 
> rajith attapattu
> 
>


Re: Review Request: Address deadlock btw _messageDeliveryLock and _failoverMutex

Posted by rajith attapattu <ra...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10738/
-----------------------------------------------------------

(Updated May 21, 2013, 2:48 p.m.)


Review request for qpid, Robbie Gemmell, Weston Price, and Rob Godfrey.


Changes
-------

Added the missing class ConditionManager.java


Description
-------

There are at least 3 cases where the deadlock btw _messageDeliveryLock and _failoverMutex surfaces. Among them sending a message inside onMessage() and the session being closed due to an error (causing the deadlock) seems to come up a lot in production environments. There is also a deadlock btw _messageDeliveryLock and _lock (AMQSession.java) which happens less frequently.
The messageDeliveryLock is used to ensure that we don't close the session in the middle of a message delivery. In order to do this we hold the lock across onMessage().
This causes several issues in addition to the potential to deadlock. If an onMessage call takes longer/wedged then you cannot close the session or failover will not happen until it returns as the same thread is holding the failoverMutex. 

Based on an idea from Rafi, I have come up with a solution to get rid of _messageDeliveryLock and instead use an alternative strategy to achieve similar functionality.
In order to ensure that close() doesn't proceed until the message deliveries currently in progress completes, an atomic counter is used to keep track of message deliveries in progress.
The close() will wait until the count falls to zero before proceeding. No new deliveries will be initiated bcos the close method will mark the session as closed.
The wait has a timeout to ensure that a longer running or wedged onMessage() will not hold up session close.
There is a slim chance that before a session being marked as closed a message delivery could be initiated, but not yet gotten to the point of updating the counter, hence waitForMsgDeliveryToFinish() will see it as zero and proceed with close. But in comparison to the issues with _messageDeliveryLock, I believe it's acceptable.

There is an issue if MessageConsumer close is called outside of Session close. This can be solved in a similar manner. I will wait until the current review is complete and then post the solution for the MessageConsumer close.
I will commit them both together.


This addresses bug QPID-4574.
    https://issues.apache.org/jira/browse/QPID-4574


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQConnection.java 1484812 
  http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java 1484812 
  http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java 1484812 
  http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer_0_10.java 1484812 
  http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/util/ConditionManager.java PRE-CREATION 

Diff: https://reviews.apache.org/r/10738/diff/


Testing
-------

Java test suite, tests from customers and QE around the deadlock situation.


Thanks,

rajith attapattu


Re: Review Request: Address deadlock btw _messageDeliveryLock and _failoverMutex

Posted by rajith attapattu <ra...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10738/
-----------------------------------------------------------

(Updated May 15, 2013, 3:06 a.m.)


Review request for qpid, Robbie Gemmell, Weston Price, and Rob Godfrey.


Changes
-------

The following changes were made from the previous iteration.
1. Address the thread safety issue identified in the review comments.

2. Moved out the common code to a new class named ConditionManager. Changed AMQSession and BasicMessageConsumer to use the
    above class to to ensure close doesn't proceed while a message delivery is in progress.
    
3. Added a check in BasicMessageConsumer_0_10 to see if the consumer is closed or closing, before issuing any credit.
    
4. Added code to both BasicMessageConsumer_0_10 and BasicMessageConsumer to reject (release in 0-10) messages instead of just dropping them. This will ensure the messages are immediately requeued.
   Didn't add this for the session as in 0-10, messages will be requeued (if not acked) when the session is closed.
   Not sure what happens in pre 0-10. TODO to verify that situation.

5. Removed unwanted code.


Description
-------

There are at least 3 cases where the deadlock btw _messageDeliveryLock and _failoverMutex surfaces. Among them sending a message inside onMessage() and the session being closed due to an error (causing the deadlock) seems to come up a lot in production environments. There is also a deadlock btw _messageDeliveryLock and _lock (AMQSession.java) which happens less frequently.
The messageDeliveryLock is used to ensure that we don't close the session in the middle of a message delivery. In order to do this we hold the lock across onMessage().
This causes several issues in addition to the potential to deadlock. If an onMessage call takes longer/wedged then you cannot close the session or failover will not happen until it returns as the same thread is holding the failoverMutex. 

Based on an idea from Rafi, I have come up with a solution to get rid of _messageDeliveryLock and instead use an alternative strategy to achieve similar functionality.
In order to ensure that close() doesn't proceed until the message deliveries currently in progress completes, an atomic counter is used to keep track of message deliveries in progress.
The close() will wait until the count falls to zero before proceeding. No new deliveries will be initiated bcos the close method will mark the session as closed.
The wait has a timeout to ensure that a longer running or wedged onMessage() will not hold up session close.
There is a slim chance that before a session being marked as closed a message delivery could be initiated, but not yet gotten to the point of updating the counter, hence waitForMsgDeliveryToFinish() will see it as zero and proceed with close. But in comparison to the issues with _messageDeliveryLock, I believe it's acceptable.

There is an issue if MessageConsumer close is called outside of Session close. This can be solved in a similar manner. I will wait until the current review is complete and then post the solution for the MessageConsumer close.
I will commit them both together.


This addresses bug QPID-4574.
    https://issues.apache.org/jira/browse/QPID-4574


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQConnection.java 1480271 
  http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java 1480271 
  http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java 1480271 
  http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer_0_10.java 1480271 

Diff: https://reviews.apache.org/r/10738/diff/


Testing
-------

Java test suite, tests from customers and QE around the deadlock situation.


Thanks,

rajith attapattu