You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@qpid.apache.org by CLIVE <cl...@ckjltd.co.uk> on 2014/06/16 14:20:22 UTC

qpid::messaging API TTL problems

Hi,

A client has been having problems with TTL on their messages, 
specifically messages being delivered when they believe the TTL should 
have expired.  (They are using QPID 0.28 on CentOS 6.4).

I have tracked the problem(s) down and reproduced the potential issues 
in the attached BOOST unit testcases.

The first issue is to do with the qpid::messaging::Sender. If an 
application sends messages while the sender is not connected, they are 
stored in the senders outgoing queue (assuming flush mechanism hasn't 
been activated). When the sender finally gets reconnected (using the 
reconnect functionality on the qpid::messaging::Connection), the 
messages in the senders outgoing queue get flushed as part of the 
Session reset mechanism. But if the TTL on these messages is say 1 
second and the Sender has been disconnected for more than this period, 
the messages still get sent and delivered to the associated set of 
receivers. No adjustment in TTL gets made by the Sender Implementation. 
Therefore messages that should have expired still get delivered.

The second issue is to do with the qpid::messaging::Receiver. If a 
sender and receiver are both created with an address string of 
"amq.topic/fred" and the receiver is configured with a capacity of 10. 
Then if the sender sends 100 messages with a TTL of 1 second, but the 
associated receiver is not serviced by the application for 2 seconds, 
when the receiver is serviced (using get/fetch) then 10 messages are 
still delivered to the application. In fact the number of messages 
delivered to the application, under this kind of scenerio, matches the 
capacity setting of the receiver. It would appear that even though the 
Broker is expiring messages from the queue (using qpid-tool I can see 90 
messages have been expired from the queue), it does not manage to do 
this for messages that have already been cached by the receiver due to 
its capacity setting.

What are peoples thoughts on this behavior?

Any help/comments would be gratefully received.

Re: qpid::messaging API TTL problems

Posted by Gordon Sim <gs...@redhat.com>.

On 06/16/2014 04:38 PM, Gordon Sim wrote:
> On 06/16/2014 01:20 PM, CLIVE wrote:
>> Hi,
>>
>> A client has been having problems with TTL on their messages,
>> specifically messages being delivered when they believe the TTL should
>> have expired.  (They are using QPID 0.28 on CentOS 6.4).
>>
>> I have tracked the problem(s) down and reproduced the potential issues
>> in the attached BOOST unit testcases.
>>
>> The first issue is to do with the qpid::messaging::Sender. If an
>> application sends messages while the sender is not connected, they are
>> stored in the senders outgoing queue (assuming flush mechanism hasn't
>> been activated). When the sender finally gets reconnected (using the
>> reconnect functionality on the qpid::messaging::Connection), the
>> messages in the senders outgoing queue get flushed as part of the
>> Session reset mechanism. But if the TTL on these messages is say 1
>> second and the Sender has been disconnected for more than this period,
>> the messages still get sent and delivered to the associated set of
>> receivers. No adjustment in TTL gets made by the Sender Implementation.
>> Therefore messages that should have expired still get delivered.
>
> That would be nice to fix.
>
>> The second issue is to do with the qpid::messaging::Receiver. If a
>> sender and receiver are both created with an address string of
>> "amq.topic/fred" and the receiver is configured with a capacity of 10.
>> Then if the sender sends 100 messages with a TTL of 1 second, but the
>> associated receiver is not serviced by the application for 2 seconds,
>> when the receiver is serviced (using get/fetch) then 10 messages are
>> still delivered to the application. In fact the number of messages
>> delivered to the application, under this kind of scenerio, matches the
>> capacity setting of the receiver. It would appear that even though the
>> Broker is expiring messages from the queue (using qpid-tool I can see 90
>> messages have been expired from the queue), it does not manage to do
>> this for messages that have already been cached by the receiver due to
>> its capacity setting.
>
> By default a receiver on amq.topic/fred will be using unacknowledged
> transfer. Therefore if the message has been sent to the client, the
> broker has already deleted its record.
>
> I.e. the issue is that the receiver does not itself do any expiration of
> messages from the prefetch buffer.
>
>> What are peoples thoughts on this behavior?
>
> I think both issues can clearly be considered bugs. If you raise a JIRA
> I can take a look at what a fix might involve (unless you have already
> done so).

Fyi: I have checked in fixes for both of these issues (currently only on 
the 0-10 codepath, but fixes on 1.0 to follow shortly).


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: qpid::messaging API TTL problems

Posted by Gordon Sim <gs...@redhat.com>.

On 06/16/2014 06:19 PM, CLIVE wrote:
> - valgrind with helgrind tool reports out of order locking problems
> between Session and Sender locks (may cause deadlock)

Fixed: https://svn.apache.org/r1606259

[...]
> - Session::checkError can throw exceptions that extend from
> qpid::Exception, not sure how to catch these as cannot find this
> exception in installation directory.

Fixed: https://svn.apache.org/r1606258

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: qpid::messaging API TTL problems

Posted by CLIVE <cl...@ckjltd.co.uk>.

Andrew,

Attached a reproducer for the lock order issue I was getting with 
helgrind; turned out to be a simple testcase in the end.

Also attached the helgrind output I get.

If you determine that this is a real issue, then let me know and I will 
create a JIRA.

Clive

On 17/06/2014 18:38, Andrew Stitcher wrote:
> On Mon, 2014-06-16 at 18:19 +0100, CLIVE wrote:
>> Already created QPID-5828 to cover these issues (plus some others). I
>> have also attached several boost unit tests that help demonstrate the
>> problems.
> Do you have some way to produce unit tests for the lock issues? (I'm
> assuming not, but if so I'd be very interested to hear how you do this)
>
>> I have a few more issues to report with the messaging API, but I have
>> run out of time to get them detailed with boost unit tests today, I will
>> try and add them in the next few days. But in summary the outstanding
>> issues are:
>>
>> - valgrind with helgrind tool reports out of order locking problems
>> between Session and Sender locks (may cause deadlock)
> This seems like a potential deadlock that we should look at carefully.
>
>> - valgrind with helgrind tool reports race condition with bool
>> writePending variable in .qpid/sys/posix/AsynchIO.cpp. Variable defined
>> as volatile, not sure this is actually enough to avoid data race
>> conditions, as volatile keyword provides no guarantees on concurrent access.
> This is known, and I assess it as benign - in that it is racy, but it
> doesn't matter semantically.
>
>> - Session::checkError can throw exceptions that extend from
>> qpid::Exception, not sure how to catch these as cannot find this
>> exception in installation directory.
>>
>> - Once a sender has transitioned in to a flush state (capacity/4 >
>> outgoing queue size) and a connection does not exist to the broker,
>> messages sent will not be placed in the outgoing queue.
>>
>> Should I add these additional issues to QPID-5828, and attach new test
>> cases as and when I can?
> In general I would open new issues for each separate problem you find.
> Doing anything else makes it hard to work on the issues in isolation,
> and if the problems have the same underlying fix or are the same
> underlying problem we can link the issues together later.
>
> You certainly don't need to wait until you can produce a neat reproducer
> before creating an issue. Of course it is much easier to work on an
> issue with a neat reproducer though.
>
> Andrew
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
>

Re: qpid::messaging API TTL problems

Posted by CLIVE <cl...@ckjltd.co.uk>.

On 17/06/2014 18:38, Andrew Stitcher wrote:
> On Mon, 2014-06-16 at 18:19 +0100, CLIVE wrote:
>> Already created QPID-5828 to cover these issues (plus some others). I
>> have also attached several boost unit tests that help demonstrate the
>> problems.
> Do you have some way to produce unit tests for the lock issues? (I'm
> assuming not, but if so I'd be very interested to hear how you do this)
>
Yes, I have a unit test at work that was causing a dead lock situation, 
hence
the use of valgrind's helgrind tool. I will try and create a JIRA with 
an attached
unit test and valgrind report.
>> I have a few more issues to report with the messaging API, but I have
>> run out of time to get them detailed with boost unit tests today, I will
>> try and add them in the next few days. But in summary the outstanding
>> issues are:
>>
>> - valgrind with helgrind tool reports out of order locking problems
>> between Session and Sender locks (may cause deadlock)
> This seems like a potential deadlock that we should look at carefully.
>
>> - valgrind with helgrind tool reports race condition with bool
>> writePending variable in .qpid/sys/posix/AsynchIO.cpp. Variable defined
>> as volatile, not sure this is actually enough to avoid data race
>> conditions, as volatile keyword provides no guarantees on concurrent access.
> This is known, and I assess it as benign - in that it is racy, but it
> doesn't matter semantically.

O.K. If you have time could you expand on your reasoning for this. I 
have in the past
always put locks around boolean flags, so I would be interested to learn 
from your
experience as to what semantic conditions can occur that do not require 
a flag
to have a lock.
>
>> - Session::checkError can throw exceptions that extend from
>> qpid::Exception, not sure how to catch these as cannot find this
>> exception in installation directory.
>>
>> - Once a sender has transitioned in to a flush state (capacity/4 >
>> outgoing queue size) and a connection does not exist to the broker,
>> messages sent will not be placed in the outgoing queue.
>>
>> Should I add these additional issues to QPID-5828, and attach new test
>> cases as and when I can?
> In general I would open new issues for each separate problem you find.
> Doing anything else makes it hard to work on the issues in isolation,
> and if the problems have the same underlying fix or are the same
> underlying problem we can link the issues together later.
>
> You certainly don't need to wait until you can produce a neat reproducer
> before creating an issue. Of course it is much easier to work on an
> issue with a neat reproducer though.
I will create some more JIRA's in the next couple of days, but will not 
do so until
I have reproducer test cases for each of the potential problems. The 
issues uncovered
so far have come from my boost unit tests at a client site, so I already 
have the test cases,
its just a case reproducing them back at my offices and posting accordingly.

> Andrew
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: qpid::messaging API TTL problems

Posted by Andrew Stitcher <as...@redhat.com>.

On Mon, 2014-06-16 at 18:19 +0100, CLIVE wrote:
> Already created QPID-5828 to cover these issues (plus some others). I 
> have also attached several boost unit tests that help demonstrate the 
> problems.

Do you have some way to produce unit tests for the lock issues? (I'm
assuming not, but if so I'd be very interested to hear how you do this)

> 
> I have a few more issues to report with the messaging API, but I have 
> run out of time to get them detailed with boost unit tests today, I will 
> try and add them in the next few days. But in summary the outstanding 
> issues are:
> 
> - valgrind with helgrind tool reports out of order locking problems 
> between Session and Sender locks (may cause deadlock)

This seems like a potential deadlock that we should look at carefully.

> 
> - valgrind with helgrind tool reports race condition with bool 
> writePending variable in .qpid/sys/posix/AsynchIO.cpp. Variable defined 
> as volatile, not sure this is actually enough to avoid data race 
> conditions, as volatile keyword provides no guarantees on concurrent access.

This is known, and I assess it as benign - in that it is racy, but it
doesn't matter semantically.

> 
> - Session::checkError can throw exceptions that extend from 
> qpid::Exception, not sure how to catch these as cannot find this 
> exception in installation directory.
> 
> - Once a sender has transitioned in to a flush state (capacity/4 > 
> outgoing queue size) and a connection does not exist to the broker, 
> messages sent will not be placed in the outgoing queue.
> 
> Should I add these additional issues to QPID-5828, and attach new test 
> cases as and when I can?

In general I would open new issues for each separate problem you find.
Doing anything else makes it hard to work on the issues in isolation,
and if the problems have the same underlying fix or are the same
underlying problem we can link the issues together later.

You certainly don't need to wait until you can produce a neat reproducer
before creating an issue. Of course it is much easier to work on an
issue with a neat reproducer though.

Andrew

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: qpid::messaging API TTL problems

Posted by CLIVE <cl...@ckjltd.co.uk>.

Already created QPID-5828 to cover these issues (plus some others). I 
have also attached several boost unit tests that help demonstrate the 
problems.

I have a few more issues to report with the messaging API, but I have 
run out of time to get them detailed with boost unit tests today, I will 
try and add them in the next few days. But in summary the outstanding 
issues are:

- valgrind with helgrind tool reports out of order locking problems 
between Session and Sender locks (may cause deadlock)

- valgrind with helgrind tool reports race condition with bool 
writePending variable in .qpid/sys/posix/AsynchIO.cpp. Variable defined 
as volatile, not sure this is actually enough to avoid data race 
conditions, as volatile keyword provides no guarantees on concurrent access.

- Session::checkError can throw exceptions that extend from 
qpid::Exception, not sure how to catch these as cannot find this 
exception in installation directory.

- Once a sender has transitioned in to a flush state (capacity/4 > 
outgoing queue size) and a connection does not exist to the broker, 
messages sent will not be placed in the outgoing queue.

Should I add these additional issues to QPID-5828, and attach new test 
cases as and when I can?


On 16/06/2014 16:38, Gordon Sim wrote:
> On 06/16/2014 01:20 PM, CLIVE wrote:
>> Hi,
>>
>> A client has been having problems with TTL on their messages,
>> specifically messages being delivered when they believe the TTL should
>> have expired.  (They are using QPID 0.28 on CentOS 6.4).
>>
>> I have tracked the problem(s) down and reproduced the potential issues
>> in the attached BOOST unit testcases.
>>
>> The first issue is to do with the qpid::messaging::Sender. If an
>> application sends messages while the sender is not connected, they are
>> stored in the senders outgoing queue (assuming flush mechanism hasn't
>> been activated). When the sender finally gets reconnected (using the
>> reconnect functionality on the qpid::messaging::Connection), the
>> messages in the senders outgoing queue get flushed as part of the
>> Session reset mechanism. But if the TTL on these messages is say 1
>> second and the Sender has been disconnected for more than this period,
>> the messages still get sent and delivered to the associated set of
>> receivers. No adjustment in TTL gets made by the Sender Implementation.
>> Therefore messages that should have expired still get delivered.
>
> That would be nice to fix.
>
>> The second issue is to do with the qpid::messaging::Receiver. If a
>> sender and receiver are both created with an address string of
>> "amq.topic/fred" and the receiver is configured with a capacity of 10.
>> Then if the sender sends 100 messages with a TTL of 1 second, but the
>> associated receiver is not serviced by the application for 2 seconds,
>> when the receiver is serviced (using get/fetch) then 10 messages are
>> still delivered to the application. In fact the number of messages
>> delivered to the application, under this kind of scenerio, matches the
>> capacity setting of the receiver. It would appear that even though the
>> Broker is expiring messages from the queue (using qpid-tool I can see 90
>> messages have been expired from the queue), it does not manage to do
>> this for messages that have already been cached by the receiver due to
>> its capacity setting.
>
> By default a receiver on amq.topic/fred will be using unacknowledged 
> transfer. Therefore if the message has been sent to the client, the 
> broker has already deleted its record.
>
> I.e. the issue is that the receiver does not itself do any expiration 
> of messages from the prefetch buffer.
>
>> What are peoples thoughts on this behavior?
>
> I think both issues can clearly be considered bugs. If you raise a 
> JIRA I can take a look at what a fix might involve (unless you have 
> already done so).
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
> .
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: qpid::messaging API TTL problems

Posted by Gordon Sim <gs...@redhat.com>.

On 06/16/2014 01:20 PM, CLIVE wrote:
> Hi,
>
> A client has been having problems with TTL on their messages,
> specifically messages being delivered when they believe the TTL should
> have expired.  (They are using QPID 0.28 on CentOS 6.4).
>
> I have tracked the problem(s) down and reproduced the potential issues
> in the attached BOOST unit testcases.
>
> The first issue is to do with the qpid::messaging::Sender. If an
> application sends messages while the sender is not connected, they are
> stored in the senders outgoing queue (assuming flush mechanism hasn't
> been activated). When the sender finally gets reconnected (using the
> reconnect functionality on the qpid::messaging::Connection), the
> messages in the senders outgoing queue get flushed as part of the
> Session reset mechanism. But if the TTL on these messages is say 1
> second and the Sender has been disconnected for more than this period,
> the messages still get sent and delivered to the associated set of
> receivers. No adjustment in TTL gets made by the Sender Implementation.
> Therefore messages that should have expired still get delivered.

That would be nice to fix.

> The second issue is to do with the qpid::messaging::Receiver. If a
> sender and receiver are both created with an address string of
> "amq.topic/fred" and the receiver is configured with a capacity of 10.
> Then if the sender sends 100 messages with a TTL of 1 second, but the
> associated receiver is not serviced by the application for 2 seconds,
> when the receiver is serviced (using get/fetch) then 10 messages are
> still delivered to the application. In fact the number of messages
> delivered to the application, under this kind of scenerio, matches the
> capacity setting of the receiver. It would appear that even though the
> Broker is expiring messages from the queue (using qpid-tool I can see 90
> messages have been expired from the queue), it does not manage to do
> this for messages that have already been cached by the receiver due to
> its capacity setting.

By default a receiver on amq.topic/fred will be using unacknowledged 
transfer. Therefore if the message has been sent to the client, the 
broker has already deleted its record.

I.e. the issue is that the receiver does not itself do any expiration of 
messages from the prefetch buffer.

> What are peoples thoughts on this behavior?

I think both issues can clearly be considered bugs. If you raise a JIRA 
I can take a look at what a fix might involve (unless you have already 
done so).


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org