You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by Martin Ritchie <ri...@apache.org> on 2007/09/20 14:27:18 UTC
Intermittent Test Failures [was: Re: M2 - let us try another "final" build]
On 19/09/2007, Robert Greig <ro...@gmail.com> wrote:
> On 19/09/2007, Carl Trieloff <cc...@redhat.com> wrote:
>
> > We test on list of platforms but it fails consistently (every time) on
> > the quad core RHEL5 64bit
> > Woodcrest machine
>
> I looked at the svn logs more carefully and there was a subsequent
> change by Martin on the M2.1 branch for QPID-572. I merged that this
> morning but on our continuous build machine this fails too.
>
> We are actually seeing some odd failures so we are actively
> investigating if there may be multiple problems.
>
> RG
Hi, It would be most helpful in resolving these intermittent failures
if people seeing these problems could post the Surefire reports.
Ideally collecting the reports on the relevant JIRA would be nice but
at least emailing the list with the details. As I said earlier in the
week I haven't seen the TopicSessionTest fail since I committed the
patch to the M2.1 branch. If any of you that have pointed out that TST
still fails have any details then I would really appreciate you
sharing them so that we can work to resolve the outstanding issues
--
Martin Ritchie
Re: Intermittent Test Failures [was: Re: M2 - let us try another "final" build]
Posted by Martin Ritchie <ri...@apache.org>.
On 20/09/2007, Carl Trieloff <cc...@redhat.com> wrote:
>
> > Hi, It would be most helpful in resolving these intermittent failures
> > if people seeing these problems could post the Surefire reports.
>
> I need wait for Nuno, to get something on my user-id fixed at which
> point I will post more info.
>
> This is the current failure.
>
> Running org.apache.qpid.test.client.QueueBrowserTest
> Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 60.084
> sec <<< FAILURE!
Yes, thanks Carl. If your user-id has rights to look in
qpid/java/client/target/surefire-reports
You will find a report for the QueueBrowserTest that may have more
details. Hopefully it has more useful details. It should at least tell
you which test in QueueBrowserTest failed.
--
Martin Ritchie
Re: Intermittent Test Failures [was: Re: M2 - let us try another "final" build]
Posted by Rajith Attapattu <ra...@gmail.com>.
Carl,
Time elapsed: 10.024 suggests that it was hanging. (you have this problem on
the trunk)
Try doing a kill -3, and see if there is a deadlock ?
Regards,
Rajith
On 9/21/07, Carl Trieloff <cc...@redhat.com> wrote:
>
>
> >
> > This should be fixed now.
> >
> > However we are still seeing some other occasional failures on our
> > continuous build so this isn't over yet...
> >
> >
>
> this is our current failure...
>
> Running org.apache.qpid.test.unit.client.forwardall.CombinedTest
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 10.024
> sec <<< FAILURE!
>
> If it is different for you I will post the log.
>
> Carl.
>
>
>
>
Re: Intermittent Test Failures [was: Re: M2 - let us try another "final" build]
Posted by Robert Greig <ro...@gmail.com>.
On 24/09/2007, Aidan Skinner <ai...@gmail.com> wrote:
> On 9/24/07, Rupert Smith <ru...@googlemail.com> wrote:
>
> > Thanks for your patches. Does this mean that the tests in the client module
> > are now passing consistently, or do we still have more to do?
>
> I'm still seeing CombinedTest error (see QPID-589) in the same way,
> I'm still looking into exactly what's going on there.
On our build box I haven't see CombinedTest fail but overnight the
build did hang again when running the client tests. I'm investigating
that now.
RG
Re: Intermittent Test Failures [was: Re: M2 - let us try another "final" build]
Posted by Aidan Skinner <ai...@gmail.com>.
On 9/24/07, Rupert Smith <ru...@googlemail.com> wrote:
> Thanks for your patches. Does this mean that the tests in the client module
> are now passing consistently, or do we still have more to do?
I'm still seeing CombinedTest error (see QPID-589) in the same way,
I'm still looking into exactly what's going on there.
- Aidan
Re: Intermittent Test Failures [was: Re: M2 - let us try another "final" build]
Posted by Rupert Smith <ru...@googlemail.com>.
Hi Robert,
Thanks for your patches. Does this mean that the tests in the client module
are now passing consistently, or do we still have more to do?
On 23/09/2007, Robert Greig <ro...@gmail.com> wrote:
>
> On 22/09/2007, Robert Greig <ro...@gmail.com> wrote:
> > I managed to get a threaddump and it shows yet another deadlock
> > involving the dispatcher. Details are attached to QPID-589.
>
> Looking at the deadlock, it occurs because during session close, it
> sends Basic.Cancel for each consumer, and the Basic.Cancel-Ok handler
> (on a separate thread) calls Dispatcher.rejectPending which in turn
> tries to acquire the dispatcher lock. Sadly the dispatcher lock is
> already held by dispatcher.run(). Dispatcher.run is trying to acquire
> the messageDeliveryLock, which is already held by the close method is
> AMQSession.
>
> I couldn't spot an obvious solution involving reordering of locks.
> However it did occur to me that it was not necessary to send a
> Basic.Cancel where we are about to close the entire session (AMQP
> channel).
>
> Does anyone disagree and think we have to send Basic.Cancel?
>
> I have committed a change to the M2 branch so that it does not send
> Basic.Cancel where the session is closing and so far on our continuous
> build there have been no test failures or deadlocks. If it turns out
> that someone knows why we must send Basic.Cancel then I will obviously
> back out that change.
>
> RG
>
Re: Intermittent Test Failures [was: Re: M2 - let us try another
"final" build]
Posted by Rafael Schloming <ra...@redhat.com>.
Martin Ritchie wrote:
> On 24/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
>> Robert Greig wrote:
>>> What do you mean by "add the release method from 0-10"?
>>>
>>>> I do agree that processing prefetched messages is not the ideal
>>>> behavior, however it is the only one available if you want to strictly
>>>> adhere to AMQP semantics, and I would expect it to also comply with JMS
>>>> semantics presuming that you block the session.close() call until
>>>> processing of prefetched messages is complete.
>>> Is it not really up to the client developer not to use prefetch if he
>>> wants to use NO_ACK, not call consumer.close() and still quiesce the
>>> session? For example, how do you process prefetched messages for
>>> consumers that do not have a message listener, i.e. where messages are
>>> processed using receive()?
>> In 0-8 both prefetch limits are ignored when no-ack is true, so there is
>> no way to turn off prefetch when you're using no-ack. In 0-10 it is
>> possible to use flow control with no-ack since the flow control dialog
>> has been decoupled from message acknowledgment, however it would
>> entirely defeat the purpose of no-ack to not be able to use prefetch.
>>
>> Also it doesn't really matter whether you use prefetch since even if you
>> explicitly request each message to be sent prior to processing, close()
>> could still be called after that request is sent. In other words not
>> using prefetch is the same as having a prefetch of 1 which doesn't
>> eliminate the problem it simply reduces the impact to a single message
>> at the cost of reasonable performance.
>>
>> Regarding prefetching for consumers that do not have a message listener,
>> you're right that it is inherently unsafe without the ability to release
>> messages.
>>
>> --Rafael
>
> Is it not possible to use the message.reject in 0_8 to return
> messages. IIRC the strict AMQP 0_8 reject causes that consumer never
> to see the message again. Not a huge problem as the consumer is
> closing.
I don't think this would be interoperable. According to the spec
language "A rejected message MAY be discarded or dead-lettered, not
necessarily passed to another client."
--Rafael
Re: Intermittent Test Failures [was: Re: M2 - let us try another "final" build]
Posted by Martin Ritchie <ri...@apache.org>.
On 24/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
> Robert Greig wrote:
> > What do you mean by "add the release method from 0-10"?
> >
> >> I do agree that processing prefetched messages is not the ideal
> >> behavior, however it is the only one available if you want to strictly
> >> adhere to AMQP semantics, and I would expect it to also comply with JMS
> >> semantics presuming that you block the session.close() call until
> >> processing of prefetched messages is complete.
> >
> > Is it not really up to the client developer not to use prefetch if he
> > wants to use NO_ACK, not call consumer.close() and still quiesce the
> > session? For example, how do you process prefetched messages for
> > consumers that do not have a message listener, i.e. where messages are
> > processed using receive()?
>
> In 0-8 both prefetch limits are ignored when no-ack is true, so there is
> no way to turn off prefetch when you're using no-ack. In 0-10 it is
> possible to use flow control with no-ack since the flow control dialog
> has been decoupled from message acknowledgment, however it would
> entirely defeat the purpose of no-ack to not be able to use prefetch.
>
> Also it doesn't really matter whether you use prefetch since even if you
> explicitly request each message to be sent prior to processing, close()
> could still be called after that request is sent. In other words not
> using prefetch is the same as having a prefetch of 1 which doesn't
> eliminate the problem it simply reduces the impact to a single message
> at the cost of reasonable performance.
>
> Regarding prefetching for consumers that do not have a message listener,
> you're right that it is inherently unsafe without the ability to release
> messages.
>
> --Rafael
Is it not possible to use the message.reject in 0_8 to return
messages. IIRC the strict AMQP 0_8 reject causes that consumer never
to see the message again. Not a huge problem as the consumer is
closing.
--
Martin Ritchie
Re: Intermittent Test Failures [was: Re: M2 - let us try another
"final" build]
Posted by Rafael Schloming <ra...@redhat.com>.
Robert Greig wrote:
> What do you mean by "add the release method from 0-10"?
>
>> I do agree that processing prefetched messages is not the ideal
>> behavior, however it is the only one available if you want to strictly
>> adhere to AMQP semantics, and I would expect it to also comply with JMS
>> semantics presuming that you block the session.close() call until
>> processing of prefetched messages is complete.
>
> Is it not really up to the client developer not to use prefetch if he
> wants to use NO_ACK, not call consumer.close() and still quiesce the
> session? For example, how do you process prefetched messages for
> consumers that do not have a message listener, i.e. where messages are
> processed using receive()?
In 0-8 both prefetch limits are ignored when no-ack is true, so there is
no way to turn off prefetch when you're using no-ack. In 0-10 it is
possible to use flow control with no-ack since the flow control dialog
has been decoupled from message acknowledgment, however it would
entirely defeat the purpose of no-ack to not be able to use prefetch.
Also it doesn't really matter whether you use prefetch since even if you
explicitly request each message to be sent prior to processing, close()
could still be called after that request is sent. In other words not
using prefetch is the same as having a prefetch of 1 which doesn't
eliminate the problem it simply reduces the impact to a single message
at the cost of reasonable performance.
Regarding prefetching for consumers that do not have a message listener,
you're right that it is inherently unsafe without the ability to release
messages.
--Rafael
Re: Intermittent Test Failures [was: Re: M2 - let us try another "final" build]
Posted by Robert Greig <ro...@gmail.com>.
What do you mean by "add the release method from 0-10"?
> I do agree that processing prefetched messages is not the ideal
> behavior, however it is the only one available if you want to strictly
> adhere to AMQP semantics, and I would expect it to also comply with JMS
> semantics presuming that you block the session.close() call until
> processing of prefetched messages is complete.
Is it not really up to the client developer not to use prefetch if he
wants to use NO_ACK, not call consumer.close() and still quiesce the
session? For example, how do you process prefetched messages for
consumers that do not have a message listener, i.e. where messages are
processed using receive()?
RG
Re: Intermittent Test Failures [was: Re: M2 - let us try another
"final" build]
Posted by Rafael Schloming <ra...@redhat.com>.
Robert Greig wrote:
> On 25/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
>
>> The JMS semantics are pretty clear. Close at the consumer or session
>> level is supposed to block until any in-progress message listeners are
>> finished. If there are prefetched messages remaining after the
>> in-progress listeners are finished, they either need to be returned to
>> the server (i.e. option 2 except without abusing reject), or processed
>> (option 3).
>
> Does the JMS spec have any concept of prefetch? I will have to check.
> In progress message listeners to me means processing of a single
> message.
>
>> Option 3 seems like a reasonable extension of JMS semantics in the
>> presence of prefetch. Option 2 (without abusing reject) seems the most
>> correct. I'm not sure why you'd ever want to do option (1). It is
>> basically the same as option (2) except any messages prefetched for that
>> consumer are now stranded for the duration of the session. This doesn't
>> seem very friendly, and certainly wouldn't be a good default.
>
> I think 1 and 2 are different. To me, 1 would be the same as the
> ctrl-c behaviour i.e. there would be no acks so the messages would be
> requeued. 2 is the client saying "I don't want these messages".
> Ironically, I would say that 2 is the one I would probably never want
> to use since I would probably only want to reject messages I have had
> a look at.
>
> I am not sure what you mean by "stranded for the duration of the
> session". The messages would be requeued and if there was another
> consumer they would be delivered to that consumer, no?
They can't be requeued if the client can still process them.
> In fact now that I look at what I've written I am shocked we do (2).
> It seems very wrong to me.
By option 2 (without abusing reject) I meant releasing the messages,
i.e. sending an indicator to the broker that I'm never going to ack,
reject, or process the given messages and it is safe to deliver them to
any client without indicating that they may already have been processed.
This is actually fairly close to how the java broker interprets reject,
even though it is not compliant with the spec definition of reject.
--Rafael
Re: Intermittent Test Failures [was: Re: M2 - let us try another "final" build]
Posted by Robert Greig <ro...@gmail.com>.
On 25/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
> The JMS semantics are pretty clear. Close at the consumer or session
> level is supposed to block until any in-progress message listeners are
> finished. If there are prefetched messages remaining after the
> in-progress listeners are finished, they either need to be returned to
> the server (i.e. option 2 except without abusing reject), or processed
> (option 3).
Does the JMS spec have any concept of prefetch? I will have to check.
In progress message listeners to me means processing of a single
message.
> Option 3 seems like a reasonable extension of JMS semantics in the
> presence of prefetch. Option 2 (without abusing reject) seems the most
> correct. I'm not sure why you'd ever want to do option (1). It is
> basically the same as option (2) except any messages prefetched for that
> consumer are now stranded for the duration of the session. This doesn't
> seem very friendly, and certainly wouldn't be a good default.
I think 1 and 2 are different. To me, 1 would be the same as the
ctrl-c behaviour i.e. there would be no acks so the messages would be
requeued. 2 is the client saying "I don't want these messages".
Ironically, I would say that 2 is the one I would probably never want
to use since I would probably only want to reject messages I have had
a look at.
I am not sure what you mean by "stranded for the duration of the
session". The messages would be requeued and if there was another
consumer they would be delivered to that consumer, no?
In fact now that I look at what I've written I am shocked we do (2).
It seems very wrong to me.
RG
Re: Intermittent Test Failures [was: Re: M2 - let us try another
"final" build]
Posted by Rafael Schloming <ra...@redhat.com>.
Robert Greig wrote:
> On 25/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
>
>> It's not arbitrary. An ack informs you that a message has been
>> processed, but you can't infer one way or another from the absence of an
>> ack, therefore you *have* to deal with the possibility that these
>> messages have been processed already regardless of whether you do it by
>> setting the redelivered flag or by DLQing the message. Either way I
>> don't think it's acceptable for a routine close of a consumer to cause
>> redelivery of a slew of messages that may already have been processed.
>> It would, for example, be unacceptable to any application that requires
>> human intervention to deal with redelivered messages.
>
> I think it is wrong to say you can DLQ a message because you have not
> received an ack. A DLQ is for cases where the client has rejected a
> message explicitly or you cannot deliver a message.
That's not what I said. What I said was the broker must have the option
to DLQ a message if the client repeatedly terminates without
acknowledging or releasing the message. This is something that could
easily happen if normal termination results in unacked messages the same
way crashing does.
In other words what I'm saying is that it is a bad thing if the broker
cant tell the difference between normal termination and a crash.
> DLQing a message because of lack of ack hugely complicates recovery
> from the application's perspective. Consider the case of an app that
> crashes for some reason during processing and does not send an ack for
> a message.
>
> If that message were DLQ'd then what would the app do upon startup? It
> would have to know to check a DLQ for messages before consuming from
> the normal queue, or it would require operator intervention to move
> the messages from the DLQ back onto the normal queue. Certainly in the
> environment that I work in, that would be unacceptable to most
> applications since it would lengthen and complicate the recovery
> process hugely.
How exactly an application wants to deal with recover probably depends
on the application. For some it may be more convenient for messages to
be on the same queue with a flag set, for others it may be more
convenient to automatically route them to a different queue. I don't
think the difference is material to my argument.
> To me an ack is a lower level concern - did you get the message, not
> "I can't process the message".
I'm not sure I understand this. A message level ack means that the
message was processed, not that the message was received. Repeated
crashing of a client is what means "I can't process the message."
--Rafael
Re: Intermittent Test Failures [was: Re: M2 - let us try another "final" build]
Posted by Robert Greig <ro...@gmail.com>.
On 25/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
> It's not arbitrary. An ack informs you that a message has been
> processed, but you can't infer one way or another from the absence of an
> ack, therefore you *have* to deal with the possibility that these
> messages have been processed already regardless of whether you do it by
> setting the redelivered flag or by DLQing the message. Either way I
> don't think it's acceptable for a routine close of a consumer to cause
> redelivery of a slew of messages that may already have been processed.
> It would, for example, be unacceptable to any application that requires
> human intervention to deal with redelivered messages.
I think it is wrong to say you can DLQ a message because you have not
received an ack. A DLQ is for cases where the client has rejected a
message explicitly or you cannot deliver a message.
DLQing a message because of lack of ack hugely complicates recovery
from the application's perspective. Consider the case of an app that
crashes for some reason during processing and does not send an ack for
a message.
If that message were DLQ'd then what would the app do upon startup? It
would have to know to check a DLQ for messages before consuming from
the normal queue, or it would require operator intervention to move
the messages from the DLQ back onto the normal queue. Certainly in the
environment that I work in, that would be unacceptable to most
applications since it would lengthen and complicate the recovery
process hugely.
To me an ack is a lower level concern - did you get the message, not
"I can't process the message".
RG
Re: Intermittent Test Failures [was: Re: M2 - let us try another
"final" build]
Posted by Rafael Schloming <ra...@redhat.com>.
Robert Greig wrote:
> On 25/09/2007, Gordon Sim <gs...@redhat.com> wrote:
>
>> I think by closing the session the application is saying it wants to
>> quit. Perhaps the close on the MessageConsumer could do something like
>> this... i.e. don't return from that close until all the messages have
>> been pumped through the listener?
>
> I agree. Currently if I understand the code correctly (Martin correct
> me if I am wrong) a close() on the message consumer will reject all
> prefetched messages. We could perhaps have some extended JMS methods
> on the consumer to:
>
> 1) closeImmediately (maybe closeJFDI() :-))
> 2) closeRejecting (current behaviour)
> 3) closeAfterProcessing - close after processing any prefetched messages
>
> I personally think that the default close() should be (1) and that
> closing a session should do (1) on any unclosed consumers.
The JMS semantics are pretty clear. Close at the consumer or session
level is supposed to block until any in-progress message listeners are
finished. If there are prefetched messages remaining after the
in-progress listeners are finished, they either need to be returned to
the server (i.e. option 2 except without abusing reject), or processed
(option 3).
Option 3 seems like a reasonable extension of JMS semantics in the
presence of prefetch. Option 2 (without abusing reject) seems the most
correct. I'm not sure why you'd ever want to do option (1). It is
basically the same as option (2) except any messages prefetched for that
consumer are now stranded for the duration of the session. This doesn't
seem very friendly, and certainly wouldn't be a good default.
--Rafael
Re: Intermittent Test Failures [was: Re: M2 - let us try another "final" build]
Posted by Robert Greig <ro...@gmail.com>.
On 25/09/2007, Gordon Sim <gs...@redhat.com> wrote:
> I think by closing the session the application is saying it wants to
> quit. Perhaps the close on the MessageConsumer could do something like
> this... i.e. don't return from that close until all the messages have
> been pumped through the listener?
I agree. Currently if I understand the code correctly (Martin correct
me if I am wrong) a close() on the message consumer will reject all
prefetched messages. We could perhaps have some extended JMS methods
on the consumer to:
1) closeImmediately (maybe closeJFDI() :-))
2) closeRejecting (current behaviour)
3) closeAfterProcessing - close after processing any prefetched messages
I personally think that the default close() should be (1) and that
closing a session should do (1) on any unclosed consumers.
RG
Re: Intermittent Test Failures [was: Re: M2 - let us try another "final" build]
Posted by Robert Greig <ro...@gmail.com>.
On 26/09/2007, Gordon Sim <gs...@redhat.com> wrote:
> I think I would find it odd if an implementation kept pumping through
> messages after I called close, particularly calling it on a Session.
> That should at least be an option; after all perhaps the session is
> being closed because processing has to be interrupted due to
> unavailability of some resource, or due to a user action.
I agree. Having been thinking about this, my issue is that although
from a protocol perspective prefetched messages are "delivered", I
view prefetching as an optimisation and from the client developer's
perspective those message can really be thought of as on the broker.
Therefore if you still get messages when you close() that is really
like continuing to deliver messages from the broker, at least that is
how it appears to the client developer.
RG
Re: Intermittent Test Failures [was: Re: M2 - let us try another
"final" build]
Posted by Rafael Schloming <ra...@redhat.com>.
Gordon Sim wrote:
>> The idea is to fail fast rather than fail subtly by using reject in a
>> non standard way. For interoperability I think that continuing to
>> process prefetched messages is the way to go.
>
> I think I would find it odd if an implementation kept pumping through
> messages after I called close, particularly calling it on a Session.
> That should at least be an option; after all perhaps the session is
> being closed because processing has to be interrupted due to
> unavailability of some resource, or due to a user action.
We could always indicate to the listener that the session/consumer is
closing, e.g. set a header indicating that the message passed to the
listener was prefetched. That way the application itself could decide
how urgent the need to close is and whether throwing away the message is
warranted or not. I think we'd have to do something like this for no-ack
since, as you pointed out, there is no way to release those messages
back to the broker.
--Rafael
Re: Intermittent Test Failures [was: Re: M2 - let us try another
"final" build]
Posted by Gordon Sim <gs...@redhat.com>.
Rafael Schloming wrote:
> Gordon Sim wrote:
>> The reliability model in my view sets the expectation that a message
>> stays on a queue until acked or until explicitly rejected.
>
> I'm really not suggesting that unacked messages should be arbitrarily
> dequeued willy-nilly. What I'm suggesting is that brokers should have
> room to detect that a particular message is causing a client to crash.
> See my other email for more details on this.
I actually don't see how brokers can know why a client is crashing, and
feel that dealing with poisened messages is something for the
application/system to handle (using reject or admin functions).
If a broker implementation decided to offer something like you are
describing that would be fine in my view, but it would be a non-standard
option (and should therefore be something that can be turned off).
However I think allowing or suggesting that possibility in the spec
would be a bad thing (unless very clearly qualified). Its important for
applications to be able to rely on the fact that unacked messages remain
on their original queue.
This was really a bit of a tangent though, and that issue is probably
more relevant for the AMQP WG.
[...]
> In my view normal open/close of
> sessions and consumers should never cause redelivery of messages. C-c,
> kill -9, network outages are all another matter of course, but IMHO
> session.close() or consumer.close() is the thing that you try *before*
> resorting to C-c or kill -9.
For 0-10 I agree with you. It makes sense to release messages explicitly
during a clean shutdown and Session.close, MessageConsumer.close seem to
be the places to do that. (You are right of course that these are
equivalent based on the Javadoc).
[...]
>> However I don't think that retrofitting release is any better than
>> using reject in a way that may not be portable. Neither cases is
>> guaranteed to work with other brokers, but adding a new method seems
>> even less likely to be interoperable.
>
> The idea is to fail fast rather than fail subtly by using reject in a
> non standard way. For interoperability I think that continuing to
> process prefetched messages is the way to go.
I think I would find it odd if an implementation kept pumping through
messages after I called close, particularly calling it on a Session.
That should at least be an option; after all perhaps the session is
being closed because processing has to be interrupted due to
unavailability of some resource, or due to a user action.
Re: Intermittent Test Failures [was: Re: M2 - let us try another
"final" build]
Posted by Rafael Schloming <ra...@redhat.com>.
Gordon Sim wrote:
> Rafael Schloming wrote:
>> Gordon Sim wrote:
>>> Rafael Schloming wrote:
>>>> I don't think this is the only difference. A broker can DLQ unacked
>>>> messages, but not released messages.
>>>
>>> What are the rules around that? I would expect unacked messages to be
>>> left on the queue, as the delivery hasn't succeeded. DLQing them
>>> seems quite wrong to me. Certainly neither of the qpid brokers do that.
>>
>> I'm not sure the spec has language explicitly stating this, but I had
>> always assumed it was an option. If you can't do this then a message
>> that causes a client to repeatedly crash before it has a chance to
>> reject will effectively block up a queue forever.
>
> I think that is an issue that applications should deal with (or else
> require administrator intervention). To allow unacked messages to be
> dequeued seems an extremely bad idea to me unless there is precise rules
> around it.
>
> The reliability model in my view sets the expectation that a message
> stays on a queue until acked or until explicitly rejected.
I'm really not suggesting that unacked messages should be arbitrarily
dequeued willy-nilly. What I'm suggesting is that brokers should have
room to detect that a particular message is causing a client to crash.
See my other email for more details on this.
>> There is also another difference. Released messages will be available
>> to the broker immediately, whereas unacked messages won't be available
>> until the session closes, so a client impl can't depend on recovery of
>> unacked messages for cleanup when it closes a consumer since those
>> unacked messages would be stranded with that client until the whole
>> session closes.
>
> Yes, I agree that release is good for early indication that a message is
> not required, and would be useful for handling MessageConsumer.close().
>
>>> As the ack is a key reliability mechanism, allowing arbitrary DLQ
>>> decisions based on unacked deliveries seems to me to undermine the
>>> ack-based reliability model.
>>
>> It's not arbitrary. An ack informs you that a message has been
>> processed, but you can't infer one way or another from the absence of
>> an ack, therefore you *have* to deal with the possibility that these
>> messages have been processed already regardless of whether you do it
>> by setting the redelivered flag or by DLQing the message.
>
> What seems arbitrary is the decision to either leave it on the original
> queue with the redelivered flag set or DLQ the message. Its the latter
> option I'm against; I don't think its valid behaviour.
>
>> Either way I don't think it's acceptable for a routine close of a
>> consumer to cause redelivery of a slew of messages that may already
>> have been processed. It would, for example, be unacceptable to any
>> application that requires human intervention to deal with redelivered
>> messages.
>
> I agree that minimising the number of messages that the broker marks as
> redelivered is desirable. As I said in the first mail I also think that
> release is a valuable addition to cater for the case where there is no
> ambiguity about processed state. My original point was that I didn't see
> much benefit in retrofitting it to older versions of the protocol.
I would state this a bit more strongly. In my view normal open/close of
sessions and consumers should never cause redelivery of messages. C-c,
kill -9, network outages are all another matter of course, but IMHO
session.close() or consumer.close() is the thing that you try *before*
resorting to C-c or kill -9.
> (Btw, we have been talking about session.close here aren't we? i.e. not
> MessageConsumer.close() which would I think be a better place for
> handling any releasing).
They are pretty much the same. Session.close() is defined the same way
as MessageConsumer.close() except it operates on all MessageConsumers,
not just the one.
>>> [...]
>>>>> In the case of the no-ack mode, the whole aim is to allow
>>>>> optimisation of the case where redelivery is not required (e.g.
>>>>> often where a client has its own exclusive queue representing a
>>>>> subscription).
>>>>
>>>> That's a good point. Releasing prefetched messages in no-ack mode
>>>> won't actually do anything since they may have already been
>>>> discarded. Given that I would fall back to processing all prefetched
>>>> messages in the case of no-ack and letting the user choose to throw
>>>> them away if that is appropriate for the application.
>>>
>>> I think by closing the session the application is saying it wants to
>>> quit. Perhaps the close on the MessageConsumer could do something
>>> like this... i.e. don't return from that close until all the messages
>>> have been pumped through the listener?
>>
>> I think this would be reasonable if you wanted to avoid back-porting
>> release to 0-8, but as the code already mis-uses reject to indicate
>> release, I'm not sure there is much point to avoiding it.
>
> My point was that MessageConsumer.close would perhaps be a better place
> to try and handle the closing of consumer state (being under the
> assumption that the debate thus far had been focused on Session.close()).
Yes, I think the JMS semantics pretty much imply that Session.close()
calls MessageConsumer.close() for all open consumers on the session.
> However I don't think that retrofitting release is any better than using
> reject in a way that may not be portable. Neither cases is guaranteed to
> work with other brokers, but adding a new method seems even less likely
> to be interoperable.
The idea is to fail fast rather than fail subtly by using reject in a
non standard way. For interoperability I think that continuing to
process prefetched messages is the way to go.
--Rafael
Re: Intermittent Test Failures [was: Re: M2 - let us try another
"final" build]
Posted by Gordon Sim <gs...@redhat.com>.
Rafael Schloming wrote:
> Gordon Sim wrote:
>> Rafael Schloming wrote:
>>> I don't think this is the only difference. A broker can DLQ unacked
>>> messages, but not released messages.
>>
>> What are the rules around that? I would expect unacked messages to be
>> left on the queue, as the delivery hasn't succeeded. DLQing them seems
>> quite wrong to me. Certainly neither of the qpid brokers do that.
>
> I'm not sure the spec has language explicitly stating this, but I had
> always assumed it was an option. If you can't do this then a message
> that causes a client to repeatedly crash before it has a chance to
> reject will effectively block up a queue forever.
I think that is an issue that applications should deal with (or else
require administrator intervention). To allow unacked messages to be
dequeued seems an extremely bad idea to me unless there is precise rules
around it.
The reliability model in my view sets the expectation that a message
stays on a queue until acked or until explicitly rejected.
> There is also another difference. Released messages will be available to
> the broker immediately, whereas unacked messages won't be available
> until the session closes, so a client impl can't depend on recovery of
> unacked messages for cleanup when it closes a consumer since those
> unacked messages would be stranded with that client until the whole
> session closes.
Yes, I agree that release is good for early indication that a message is
not required, and would be useful for handling MessageConsumer.close().
>> As the ack is a key reliability mechanism, allowing arbitrary DLQ
>> decisions based on unacked deliveries seems to me to undermine the
>> ack-based reliability model.
>
> It's not arbitrary. An ack informs you that a message has been
> processed, but you can't infer one way or another from the absence of an
> ack, therefore you *have* to deal with the possibility that these
> messages have been processed already regardless of whether you do it by
> setting the redelivered flag or by DLQing the message.
What seems arbitrary is the decision to either leave it on the original
queue with the redelivered flag set or DLQ the message. Its the latter
option I'm against; I don't think its valid behaviour.
> Either way I
> don't think it's acceptable for a routine close of a consumer to cause
> redelivery of a slew of messages that may already have been processed.
> It would, for example, be unacceptable to any application that requires
> human intervention to deal with redelivered messages.
I agree that minimising the number of messages that the broker marks as
redelivered is desirable. As I said in the first mail I also think that
release is a valuable addition to cater for the case where there is no
ambiguity about processed state. My original point was that I didn't see
much benefit in retrofitting it to older versions of the protocol.
(Btw, we have been talking about session.close here aren't we? i.e. not
MessageConsumer.close() which would I think be a better place for
handling any releasing).
>> [...]
>>>> In the case of the no-ack mode, the whole aim is to allow
>>>> optimisation of the case where redelivery is not required (e.g.
>>>> often where a client has its own exclusive queue representing a
>>>> subscription).
>>>
>>> That's a good point. Releasing prefetched messages in no-ack mode
>>> won't actually do anything since they may have already been
>>> discarded. Given that I would fall back to processing all prefetched
>>> messages in the case of no-ack and letting the user choose to throw
>>> them away if that is appropriate for the application.
>>
>> I think by closing the session the application is saying it wants to
>> quit. Perhaps the close on the MessageConsumer could do something like
>> this... i.e. don't return from that close until all the messages have
>> been pumped through the listener?
>
> I think this would be reasonable if you wanted to avoid back-porting
> release to 0-8, but as the code already mis-uses reject to indicate
> release, I'm not sure there is much point to avoiding it.
My point was that MessageConsumer.close would perhaps be a better place
to try and handle the closing of consumer state (being under the
assumption that the debate thus far had been focused on Session.close()).
However I don't think that retrofitting release is any better than using
reject in a way that may not be portable. Neither cases is guaranteed to
work with other brokers, but adding a new method seems even less likely
to be interoperable.
Re: Intermittent Test Failures [was: Re: M2 - let us try another
"final" build]
Posted by Rafael Schloming <ra...@redhat.com>.
Gordon Sim wrote:
> Rafael Schloming wrote:
>> Gordon Sim wrote:
>>> The only difference between an explicit 'release' and not
>>> acknowledging a message is that the redelivered flag will be set in
>>> the latter case, but not the former.
>>
>> I don't think this is the only difference. A broker can DLQ unacked
>> messages, but not released messages.
>
> What are the rules around that? I would expect unacked messages to be
> left on the queue, as the delivery hasn't succeeded. DLQing them seems
> quite wrong to me. Certainly neither of the qpid brokers do that.
I'm not sure the spec has language explicitly stating this, but I had
always assumed it was an option. If you can't do this then a message
that causes a client to repeatedly crash before it has a chance to
reject will effectively block up a queue forever.
There is also another difference. Released messages will be available to
the broker immediately, whereas unacked messages won't be available
until the session closes, so a client impl can't depend on recovery of
unacked messages for cleanup when it closes a consumer since those
unacked messages would be stranded with that client until the whole
session closes.
> As the ack is a key reliability mechanism, allowing arbitrary DLQ
> decisions based on unacked deliveries seems to me to undermine the
> ack-based reliability model.
It's not arbitrary. An ack informs you that a message has been
processed, but you can't infer one way or another from the absence of an
ack, therefore you *have* to deal with the possibility that these
messages have been processed already regardless of whether you do it by
setting the redelivered flag or by DLQing the message. Either way I
don't think it's acceptable for a routine close of a consumer to cause
redelivery of a slew of messages that may already have been processed.
It would, for example, be unacceptable to any application that requires
human intervention to deal with redelivered messages.
> [...]
>>> In the case of the no-ack mode, the whole aim is to allow
>>> optimisation of the case where redelivery is not required (e.g. often
>>> where a client has its own exclusive queue representing a subscription).
>>
>> That's a good point. Releasing prefetched messages in no-ack mode
>> won't actually do anything since they may have already been discarded.
>> Given that I would fall back to processing all prefetched messages in
>> the case of no-ack and letting the user choose to throw them away if
>> that is appropriate for the application.
>
> I think by closing the session the application is saying it wants to
> quit. Perhaps the close on the MessageConsumer could do something like
> this... i.e. don't return from that close until all the messages have
> been pumped through the listener?
I think this would be reasonable if you wanted to avoid back-porting
release to 0-8, but as the code already mis-uses reject to indicate
release, I'm not sure there is much point to avoiding it.
--Rafael
Re: Intermittent Test Failures [was: Re: M2 - let us try another
"final" build]
Posted by Gordon Sim <gs...@redhat.com>.
Rafael Schloming wrote:
> Gordon Sim wrote:
>> The only difference between an explicit 'release' and not
>> acknowledging a message is that the redelivered flag will be set in
>> the latter case, but not the former.
>
> I don't think this is the only difference. A broker can DLQ unacked
> messages, but not released messages.
What are the rules around that? I would expect unacked messages to be
left on the queue, as the delivery hasn't succeeded. DLQing them seems
quite wrong to me. Certainly neither of the qpid brokers do that.
As the ack is a key reliability mechanism, allowing arbitrary DLQ
decisions based on unacked deliveries seems to me to undermine the
ack-based reliability model.
[...]
>> In the case of the no-ack mode, the whole aim is to allow optimisation
>> of the case where redelivery is not required (e.g. often where a
>> client has its own exclusive queue representing a subscription).
>
> That's a good point. Releasing prefetched messages in no-ack mode won't
> actually do anything since they may have already been discarded. Given
> that I would fall back to processing all prefetched messages in the case
> of no-ack and letting the user choose to throw them away if that is
> appropriate for the application.
I think by closing the session the application is saying it wants to
quit. Perhaps the close on the MessageConsumer could do something like
this... i.e. don't return from that close until all the messages have
been pumped through the listener?
Re: Intermittent Test Failures [was: Re: M2 - let us try another
"final" build]
Posted by Rafael Schloming <ra...@redhat.com>.
Gordon Sim wrote:
> Rafael Schloming wrote:
>> Robert Greig wrote:
>>> On 24/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
>>>
>>>> That's true, however I would think that the expected JMS behavior would
>>>> be for connection close to do a clean shutdown.
>>>
>>> OK. Note that this is Session.close() so it will close the channel.
>>> Apart from processing all prefetched messages (which I think it's
>>> arguable is not what someone doing a close on a session would want),
>>> what do you think a clean shutdown of a consumer would involve?
>>
>> For 0-10 I would issue a message.cancel for all subscriptions, and do
>> an execution.sync to confirm that they are all complete. At that point
>> I know that no more messages will be arriving and then I would issue a
>> message.release for all the prefetched messages.
>>
>> For 0-8 I would do something similar. Issue a synchronous basic.cancel
>> for each subscription. When they are all complete, for strict AMQP
>> mode I would then process all the prefetched messages, and for non
>> strict AMQP mode I would add the release method from 0-10 and use that
>> to release prefetched messages.
>
> The only difference between an explicit 'release' and not acknowledging
> a message is that the redelivered flag will be set in the latter case,
> but not the former.
I don't think this is the only difference. A broker can DLQ unacked
messages, but not released messages.
> In each case message ordering may be lost if there are other active
> consumers on the same queue. At present the redelivered flag (which is a
> warning that the message *may* have been delivered once already, not a
> statement that it has) signals this, there isn't yet an equivalent to
> indicate potential loss of order due to release (though that will
> hopefully come).
>
> While I think the addition of the release method is valuable, I see no
> real benefit in trying to retrofit it into older implementations.
It may not be worthwhile for this case alone, however the Java client
implementation currently uses reject in a variety of ways, and the
intent is almost always to do a release, not an actual reject.
> In the case of the no-ack mode, the whole aim is to allow optimisation
> of the case where redelivery is not required (e.g. often where a client
> has its own exclusive queue representing a subscription).
That's a good point. Releasing prefetched messages in no-ack mode won't
actually do anything since they may have already been discarded. Given
that I would fall back to processing all prefetched messages in the case
of no-ack and letting the user choose to throw them away if that is
appropriate for the application.
--Rafael
Re: Intermittent Test Failures [was: Re: M2 - let us try another
"final" build]
Posted by Gordon Sim <gs...@redhat.com>.
Rafael Schloming wrote:
> Robert Greig wrote:
>> On 24/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
>>
>>> That's true, however I would think that the expected JMS behavior would
>>> be for connection close to do a clean shutdown.
>>
>> OK. Note that this is Session.close() so it will close the channel.
>> Apart from processing all prefetched messages (which I think it's
>> arguable is not what someone doing a close on a session would want),
>> what do you think a clean shutdown of a consumer would involve?
>
> For 0-10 I would issue a message.cancel for all subscriptions, and do an
> execution.sync to confirm that they are all complete. At that point I
> know that no more messages will be arriving and then I would issue a
> message.release for all the prefetched messages.
>
> For 0-8 I would do something similar. Issue a synchronous basic.cancel
> for each subscription. When they are all complete, for strict AMQP mode
> I would then process all the prefetched messages, and for non strict
> AMQP mode I would add the release method from 0-10 and use that to
> release prefetched messages.
The only difference between an explicit 'release' and not acknowledging
a message is that the redelivered flag will be set in the latter case,
but not the former.
In each case message ordering may be lost if there are other active
consumers on the same queue. At present the redelivered flag (which is a
warning that the message *may* have been delivered once already, not a
statement that it has) signals this, there isn't yet an equivalent to
indicate potential loss of order due to release (though that will
hopefully come).
While I think the addition of the release method is valuable, I see no
real benefit in trying to retrofit it into older implementations.
In the case of the no-ack mode, the whole aim is to allow optimisation
of the case where redelivery is not required (e.g. often where a client
has its own exclusive queue representing a subscription).
Re: Intermittent Test Failures [was: Re: M2 - let us try another
"final" build]
Posted by Rafael Schloming <ra...@redhat.com>.
Robert Greig wrote:
> On 24/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
>
>> That's true, however I would think that the expected JMS behavior would
>> be for connection close to do a clean shutdown.
>
> OK. Note that this is Session.close() so it will close the channel.
> Apart from processing all prefetched messages (which I think it's
> arguable is not what someone doing a close on a session would want),
> what do you think a clean shutdown of a consumer would involve?
For 0-10 I would issue a message.cancel for all subscriptions, and do an
execution.sync to confirm that they are all complete. At that point I
know that no more messages will be arriving and then I would issue a
message.release for all the prefetched messages.
For 0-8 I would do something similar. Issue a synchronous basic.cancel
for each subscription. When they are all complete, for strict AMQP mode
I would then process all the prefetched messages, and for non strict
AMQP mode I would add the release method from 0-10 and use that to
release prefetched messages.
In all cases the session would be quiesced and it would be safe to issue
a session.close().
I do agree that processing prefetched messages is not the ideal
behavior, however it is the only one available if you want to strictly
adhere to AMQP semantics, and I would expect it to also comply with JMS
semantics presuming that you block the session.close() call until
processing of prefetched messages is complete.
--Rafael
Re: Intermittent Test Failures [was: Re: M2 - let us try another "final" build]
Posted by Robert Greig <ro...@gmail.com>.
On 24/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
> That's true, however I would think that the expected JMS behavior would
> be for connection close to do a clean shutdown.
OK. Note that this is Session.close() so it will close the channel.
Apart from processing all prefetched messages (which I think it's
arguable is not what someone doing a close on a session would want),
what do you think a clean shutdown of a consumer would involve?
RG
Re: Intermittent Test Failures [was: Re: M2 - let us try another
"final" build]
Posted by Rafael Schloming <ra...@redhat.com>.
Robert Greig wrote:
> On 24/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
>
>> The java broker may not have a DLQ, but any broker being conservative
>> about exactly once semantics will need to have a DLQ for messages that
>> may have been processed by a client. Messages sent on a connection that
>> was aborted would fall into this category.
>
> OK I can see your point but this implies also (I think?) that when the
> consumer calls close() that it processes any messages that have been
> prefetched. We certainly do not do that.
>
> In JMS, close() is the only method that can be called by any thread
> and we simply stop processing. Are you suggesting that when you call
> close() on a session it should deliver all prefetched messages on all
> consumers?
In 0-10 there is a message.release that may be used to inform the broker
that prefetched messages were not actually processed. I don't think
there is a way to do this in 0-8 without either extending the spec or,
as you suggest, processing all prefetched messages.
>> There is a difference between a clean shutdown and an abort. A clean
>> shutdown will always involve some sort of handshake. So while you
>> definitely want to be as graceful as possible in the case of an abort,
>> there will fundamentally be unresolved state without the handshake, and
>> many applications will not be able to tolerate that unresolved state.
>
> Any application that needs that can call close() explicitly on the consumer.
That's true, however I would think that the expected JMS behavior would
be for connection close to do a clean shutdown.
--Rafael
Re: Intermittent Test Failures [was: Re: M2 - let us try another "final" build]
Posted by Robert Greig <ro...@gmail.com>.
On 24/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
> The java broker may not have a DLQ, but any broker being conservative
> about exactly once semantics will need to have a DLQ for messages that
> may have been processed by a client. Messages sent on a connection that
> was aborted would fall into this category.
OK I can see your point but this implies also (I think?) that when the
consumer calls close() that it processes any messages that have been
prefetched. We certainly do not do that.
In JMS, close() is the only method that can be called by any thread
and we simply stop processing. Are you suggesting that when you call
close() on a session it should deliver all prefetched messages on all
consumers?
> There is a difference between a clean shutdown and an abort. A clean
> shutdown will always involve some sort of handshake. So while you
> definitely want to be as graceful as possible in the case of an abort,
> there will fundamentally be unresolved state without the handshake, and
> many applications will not be able to tolerate that unresolved state.
Any application that needs that can call close() explicitly on the consumer.
RG
Re: Intermittent Test Failures [was: Re: M2 - let us try another
"final" build]
Posted by Rafael Schloming <ra...@redhat.com>.
Robert Greig wrote:
> On 24/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
>
>> Strictly speaking I think you do need to send a basic.cancel. Without
>> sending a basic.cancel and getting confirmation that the cancel is
>> complete the broker will still be attempting to transmit messages to a
>> client when the close occurs. If this happens when there is active
>> message flow then there will pending messages when the close occurs and
>> depending on how a broker behaves, this could cause messages to be
>> unnecessarily DLQed, or unnecessarily lost in the case of no-ack.
>
> Hmm. There is no DLQ though? Also if you have no-ack there is risk of
> message loss built into that?
The java broker may not have a DLQ, but any broker being conservative
about exactly once semantics will need to have a DLQ for messages that
may have been processed by a client. Messages sent on a connection that
was aborted would fall into this category.
As for no-ack, there is a big difference between losing messages when
the network dies, and losing messages whenever you close a connection.
There are many applications that can tolerate the former, but not the
latter.
> My logic was that it *must* work when you ctrl-C or kill -9 the client.
There is a difference between a clean shutdown and an abort. A clean
shutdown will always involve some sort of handshake. So while you
definitely want to be as graceful as possible in the case of an abort,
there will fundamentally be unresolved state without the handshake, and
many applications will not be able to tolerate that unresolved state.
--Rafael
Re: Intermittent Test Failures [was: Re: M2 - let us try another "final" build]
Posted by Robert Greig <ro...@gmail.com>.
On 24/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
> Strictly speaking I think you do need to send a basic.cancel. Without
> sending a basic.cancel and getting confirmation that the cancel is
> complete the broker will still be attempting to transmit messages to a
> client when the close occurs. If this happens when there is active
> message flow then there will pending messages when the close occurs and
> depending on how a broker behaves, this could cause messages to be
> unnecessarily DLQed, or unnecessarily lost in the case of no-ack.
Hmm. There is no DLQ though? Also if you have no-ack there is risk of
message loss built into that?
My logic was that it *must* work when you ctrl-C or kill -9 the client.
RG
Re: Intermittent Test Failures [was: Re: M2 - let us try another
"final" build]
Posted by Rafael Schloming <ra...@redhat.com>.
Robert Greig wrote:
> On 22/09/2007, Robert Greig <ro...@gmail.com> wrote:
>> I managed to get a threaddump and it shows yet another deadlock
>> involving the dispatcher. Details are attached to QPID-589.
>
> Looking at the deadlock, it occurs because during session close, it
> sends Basic.Cancel for each consumer, and the Basic.Cancel-Ok handler
> (on a separate thread) calls Dispatcher.rejectPending which in turn
> tries to acquire the dispatcher lock. Sadly the dispatcher lock is
> already held by dispatcher.run(). Dispatcher.run is trying to acquire
> the messageDeliveryLock, which is already held by the close method is
> AMQSession.
>
> I couldn't spot an obvious solution involving reordering of locks.
> However it did occur to me that it was not necessary to send a
> Basic.Cancel where we are about to close the entire session (AMQP
> channel).
>
> Does anyone disagree and think we have to send Basic.Cancel?
>
> I have committed a change to the M2 branch so that it does not send
> Basic.Cancel where the session is closing and so far on our continuous
> build there have been no test failures or deadlocks. If it turns out
> that someone knows why we must send Basic.Cancel then I will obviously
> back out that change.
Strictly speaking I think you do need to send a basic.cancel. Without
sending a basic.cancel and getting confirmation that the cancel is
complete the broker will still be attempting to transmit messages to a
client when the close occurs. If this happens when there is active
message flow then there will pending messages when the close occurs and
depending on how a broker behaves, this could cause messages to be
unnecessarily DLQed, or unnecessarily lost in the case of no-ack.
--Rafael
Re: Intermittent Test Failures [was: Re: M2 - let us try another "final" build]
Posted by Robert Greig <ro...@gmail.com>.
On 22/09/2007, Robert Greig <ro...@gmail.com> wrote:
> I managed to get a threaddump and it shows yet another deadlock
> involving the dispatcher. Details are attached to QPID-589.
Looking at the deadlock, it occurs because during session close, it
sends Basic.Cancel for each consumer, and the Basic.Cancel-Ok handler
(on a separate thread) calls Dispatcher.rejectPending which in turn
tries to acquire the dispatcher lock. Sadly the dispatcher lock is
already held by dispatcher.run(). Dispatcher.run is trying to acquire
the messageDeliveryLock, which is already held by the close method is
AMQSession.
I couldn't spot an obvious solution involving reordering of locks.
However it did occur to me that it was not necessary to send a
Basic.Cancel where we are about to close the entire session (AMQP
channel).
Does anyone disagree and think we have to send Basic.Cancel?
I have committed a change to the M2 branch so that it does not send
Basic.Cancel where the session is closing and so far on our continuous
build there have been no test failures or deadlocks. If it turns out
that someone knows why we must send Basic.Cancel then I will obviously
back out that change.
RG
Re: Intermittent Test Failures [was: Re: M2 - let us try another "final" build]
Posted by Robert Greig <ro...@gmail.com>.
On 22/09/2007, Robert Greig <ro...@gmail.com> wrote:
> On 21/09/2007, Nuno Santos <ns...@redhat.com> wrote:
>
> > Here it's actually hanging at that very test,
> > org.apache.qpid.test.unit.client.forwardall.CombinedTest. No output in
> > the surefire logs, just hangs indefinitely. It may be a local issue,
> > I'll try to troubleshoot further.
>
> If you can get a thread dump that would be very useful. If you could
> attach it to QPID-589 that would be ideal.
I managed to get a threaddump and it shows yet another deadlock
involving the dispatcher. Details are attached to QPID-589.
RG
Re: Intermittent Test Failures [was: Re: M2 - let us try another "final" build]
Posted by Robert Greig <ro...@gmail.com>.
On 21/09/2007, Nuno Santos <ns...@redhat.com> wrote:
> Here it's actually hanging at that very test,
> org.apache.qpid.test.unit.client.forwardall.CombinedTest. No output in
> the surefire logs, just hangs indefinitely. It may be a local issue,
> I'll try to troubleshoot further.
If you can get a thread dump that would be very useful. If you could
attach it to QPID-589 that would be ideal.
RG
Re: Intermittent Test Failures [was: Re: M2 - let us try another
"final" build]
Posted by Nuno Santos <ns...@redhat.com>.
Robert Greig wrote:
> On 21/09/2007, Robert Greig <ro...@gmail.com> wrote:
>
>>> Running org.apache.qpid.test.unit.client.forwardall.CombinedTest
>>> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 10.024
>>> sec <<< FAILURE!
>
> In diagnosing this I found QPID-607. I am not sure if it is directly
> related but it is certainly not good beheaviour.
>
> I have checked in a fix for QPID-607 and for now at least our M2
> continuous build is passing all tests.
>
> How does the RH continuous build look now?
Here it's actually hanging at that very test,
org.apache.qpid.test.unit.client.forwardall.CombinedTest. No output in
the surefire logs, just hangs indefinitely. It may be a local issue,
I'll try to troubleshoot further.
Nuno
Re: Intermittent Test Failures [was: Re: M2 - let us try another "final" build]
Posted by Robert Greig <ro...@gmail.com>.
On 21/09/2007, Robert Greig <ro...@gmail.com> wrote:
> > Running org.apache.qpid.test.unit.client.forwardall.CombinedTest
> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 10.024
> > sec <<< FAILURE!
In diagnosing this I found QPID-607. I am not sure if it is directly
related but it is certainly not good beheaviour.
I have checked in a fix for QPID-607 and for now at least our M2
continuous build is passing all tests.
How does the RH continuous build look now?
RG
Re: Intermittent Test Failures [was: Re: M2 - let us try another "final" build]
Posted by Robert Greig <ro...@gmail.com>.
On 21/09/2007, Carl Trieloff <cc...@redhat.com> wrote:
> this is our current failure...
>
> Running org.apache.qpid.test.unit.client.forwardall.CombinedTest
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 10.024
> sec <<< FAILURE!
>
> If it is different for you I will post the log.
No, we see this one too...
RG
Re: Intermittent Test Failures [was: Re: M2 - let us try another
"final" build]
Posted by Carl Trieloff <cc...@redhat.com>.
>
> This should be fixed now.
>
> However we are still seeing some other occasional failures on our
> continuous build so this isn't over yet...
>
>
this is our current failure...
Running org.apache.qpid.test.unit.client.forwardall.CombinedTest
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 10.024
sec <<< FAILURE!
If it is different for you I will post the log.
Carl.
Re: Intermittent Test Failures [was: Re: M2 - let us try another "final" build]
Posted by Robert Greig <ro...@gmail.com>.
On 20/09/2007, Robert Greig <ro...@gmail.com> wrote:
> > $ cat org.apache.qpid.server.AMQBrokerManagerMBeanTest.txt
> > -------------------------------------------------------------------------------
> > Test set: org.apache.qpid.server.AMQBrokerManagerMBeanTest
> > -------------------------------------------------------------------------------
> > Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.006
> > sec <<< FAILURE!
> > testExchangeOperations(org.apache.qpid.server.AMQBrokerManagerMBeanTest)
> > Time elapsed: 0.003 sec <<< ERROR!
> > java.lang.NullPointerException
This should be fixed now.
However we are still seeing some other occasional failures on our
continuous build so this isn't over yet...
RG
Re: Intermittent Test Failures [was: Re: M2 - let us try another "final" build]
Posted by Robert Greig <ro...@gmail.com>.
On 20/09/2007, Nuno Santos <ns...@redhat.com> wrote:
> Another surefire error from the latest M2 build:
>
> $ cat org.apache.qpid.server.AMQBrokerManagerMBeanTest.txt
> -------------------------------------------------------------------------------
> Test set: org.apache.qpid.server.AMQBrokerManagerMBeanTest
> -------------------------------------------------------------------------------
> Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.006
> sec <<< FAILURE!
> testExchangeOperations(org.apache.qpid.server.AMQBrokerManagerMBeanTest)
> Time elapsed: 0.003 sec <<< ERROR!
> java.lang.NullPointerException
This is not brand new; it has been happening on our continuous build
for several days now. We had another more pressing issue - hanging
builds due to a race condition which I have just applied a fix for on
both M2 and M2.1 branches.
Hopefully we'll get round to fixing the above tomorrow.
RG
Re: Intermittent Test Failures [was: Re: M2 - let us try another
"final" build]
Posted by Nuno Santos <ns...@redhat.com>.
Carl Trieloff wrote:
> -------------------------------------------------------------------------------
>
> Test set: org.apache.qpid.test.client.QueueBrowserTest
> -------------------------------------------------------------------------------
Another surefire error from the latest M2 build:
$ cat org.apache.qpid.server.AMQBrokerManagerMBeanTest.txt
-------------------------------------------------------------------------------
Test set: org.apache.qpid.server.AMQBrokerManagerMBeanTest
-------------------------------------------------------------------------------
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.006
sec <<< FAILURE!
testExchangeOperations(org.apache.qpid.server.AMQBrokerManagerMBeanTest)
Time elapsed: 0.003 sec <<< ERROR!
java.lang.NullPointerException
at
org.apache.qpid.server.AMQBrokerManagerMBeanTest.setUp(AMQBrokerManagerMBeanTest.java:89)
at junit.framework.TestCase.runBare(TestCase.java:125)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:118)
at junit.framework.TestSuite.runTest(TestSuite.java:208)
at junit.framework.TestSuite.run(TestSuite.java:203)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at
org.apache.maven.surefire.junit.JUnitTestSet.execute(JUnitTestSet.java:210)
at
org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:135)
at
org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:122)
at org.apache.maven.surefire.Surefire.run(Surefire.java:129)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at
org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:225)
at
org.apache.maven.surefire.booter.SurefireBooter.run(SurefireBooter.java:139)
at
org.apache.maven.plugin.surefire.SurefirePlugin.execute(SurefirePlugin.java:376)
at
org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:412)
at
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:534)
at
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalWithLifecycle(DefaultLifecycleExecutor.java:475)
at
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:454)
at
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:306)
at
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:273)
at
org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:140)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:322)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:115)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:256)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at
org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315)
at org.codehaus.classworlds.Launcher.launch(Launcher.java:255)
at
org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430)
at org.codehaus.classworlds.Launcher.main(Launcher.java:375)
Re: Intermittent Test Failures [was: Re: M2 - let us try another
"final" build]
Posted by Carl Trieloff <cc...@redhat.com>.
-------------------------------------------------------------------------------
Test set: org.apache.qpid.test.client.QueueBrowserTest
-------------------------------------------------------------------------------
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 60.084 sec <<< FAILURE!
testDummyinVMTestCase(org.apache.qpid.test.client.QueueBrowserTest) Time elapsed: 60.021 sec <<< ERROR!
javax.jms.JMSException: Error creating connection: State not achieved within permitted time. Current state AMQState: id = 6 name: CONNECTION_CLOSED, desired state: AMQState: id = 4 name: CONNECTION_OPEN
at org.apache.qpid.client.AMQConnectionFactory.createConnection(AMQConnectionFactory.java:270)
at org.apache.qpid.test.client.QueueBrowserTest.setUp(QueueBrowserTest.java:59)
at junit.framework.TestCase.runBare(TestCase.java:125)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:118)
at junit.framework.TestSuite.runTest(TestSuite.java:208)
at junit.framework.TestSuite.run(TestSuite.java:203)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.maven.surefire.junit.JUnitTestSet.execute(JUnitTestSet.java:210)
at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:135)
at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:122)
at org.apache.maven.surefire.Surefire.run(Surefire.java:129)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:225)
at org.apache.maven.surefire.booter.SurefireBooter.run(SurefireBooter.java:139)
at org.apache.maven.plugin.surefire.SurefirePlugin.execute(SurefirePlugin.java:376)
at org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:412)
at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:534)
at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalWithLifecycle(DefaultLifecycleExecutor.java:475)
at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:454)
at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:306)
at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:273)
at org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:140)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:322)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:115)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:256)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315)
at org.codehaus.classworlds.Launcher.launch(Launcher.java:255)
at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430)
at org.codehaus.classworlds.Launcher.main(Launcher.java:375)
Re: Intermittent Test Failures [was: Re: M2 - let us try another
"final" build]
Posted by Gordon Sim <gs...@redhat.com>.
Carl Trieloff wrote:
>
>> Hi, It would be most helpful in resolving these intermittent failures
>> if people seeing these problems could post the Surefire reports.
>
> I need wait for Nuno, to get something on my user-id fixed at which
> point I will post more info.
>
> This is the current failure.
>
> Running org.apache.qpid.test.client.QueueBrowserTest
> Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 60.084
> sec <<< FAILURE!
>
>
That is from M2, after Robert merged in your patch from M2.1 (I
believe). We also had cruisecontrol hang on
org.apache.qpid.test.unit.basic.SelectorTest yesterday. I don't have any
details I'm afraid, on restarting cruise control it didn't happen again.
Re: Intermittent Test Failures [was: Re: M2 - let us try another
"final" build]
Posted by Carl Trieloff <cc...@redhat.com>.
> Hi, It would be most helpful in resolving these intermittent failures
> if people seeing these problems could post the Surefire reports.
I need wait for Nuno, to get something on my user-id fixed at which
point I will post more info.
This is the current failure.
Running org.apache.qpid.test.client.QueueBrowserTest
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 60.084
sec <<< FAILURE!