You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@qpid.apache.org by Sandy Pratt <pr...@adobe.com> on 2009/11/17 01:26:57 UTC

Qpid Cluster Errors

Hello all,

I've been working with a Qpid of two machines recently.  It's working well for the most part, but I've been experiencing a few sporadic failures.  I've collected logs for them over time, and I was wondering if anyone else had seen anything similar and had any idea what might be happening.

Thanks in advance!

Logs:

2009-nov-06 12:35:55 notice Recovering from cluster, no recovery from local journal
2009-nov-06 12:35:55 notice Listening on TCP port 5672
2009-nov-06 12:35:55 notice broker1:9646(INIT) joining cluster renga-qpid-dev with url=amqp:tcp:broker1:5672
2009-nov-06 12:35:55 notice Broker running
2009-nov-06 12:35:55 notice Journal "test3": Created
2009-nov-06 12:35:58 notice broker1:9646(READY) caught up, active cluster member
2009-nov-06 13:56:17 error Execution exception: invalid-argument: user1@QPID.02c092ed-0256-4f70-b8d6-41e67666c8cc: confirmed < (654+0) but only sent < (653+0) (qpid/SessionState.cpp:163)
2009-nov-06 13:56:17 error broker1:9646(READY/error) channel error 1125458 on broker2:9011-347(shadow): invalid-argument: user1@QPID.02c092ed-0256-4f70-b8d6-41e67666c8cc: confirmed < (654+0) but only sent < (653+0) (qpid/SessionState.cpp:163) (unresolved: broker1:9646 broker2:9011 )
2009-nov-06 13:56:17 critical broker1:9646(READY/error) error 1125458 did not occur on broker2:9011
2009-nov-06 13:56:17 error Error delivering frames: Aborted by local failure that did not occur on all replicas
2009-nov-06 13:56:17 notice broker1:9646(LEFT/error) leaving cluster renga-qpid-dev
2009-nov-06 13:56:17 notice Shut down

2009-nov-06 15:40:22 notice Journal "TplStore": Created
2009-nov-06 15:40:22 notice Store module initialized; dir=/var/lib/qpidd
2009-nov-06 15:40:22 notice Recovering from cluster, no recovery from local journal
2009-nov-06 15:40:22 notice Listening on TCP port 5672
2009-nov-06 15:40:22 notice broker1:10496(INIT) joining cluster renga-qpid-dev with url=amqp:tcp:broker1:5672
2009-nov-06 15:40:22 notice Broker running
2009-nov-06 15:40:22 notice Journal "test3": Created
2009-nov-06 15:40:26 notice broker1:10496(READY) caught up, active cluster member
2009-nov-06 15:45:18 error Execution exception: invalid-argument: user1@QPID.8bc14c39-926e-4416-b6fb-2495b1cf9c8e: confirmed < (6502+0) but only sent < (6498+0) (qpid/SessionState.cpp:163)
2009-nov-06 15:45:18 error broker1:10496(READY/error) channel error 895882 on broker2:10019-131(shadow): invalid-argument: user1@QPID.8bc14c39-926e-4416-b6fb-2495b1cf9c8e: confirmed < (6502+0) but only sent < (6498+0) (qpid/SessionState.cpp:163) (unresolved: broker1:10496 broker2:10019 )
2009-nov-06 15:45:18 critical broker1:10496(READY/error) error 895882 did not occur on broker2:10019
2009-nov-06 15:45:18 error Error delivering frames: Aborted by local failure that did not occur on all replicas
2009-nov-06 15:45:18 notice broker1:10496(LEFT/error) leaving cluster renga-qpid-dev
2009-nov-06 15:45:18 notice Shut down

2009-nov-06 16:02:38 notice broker1:10590(INIT) joining cluster renga-qpid-dev with url=amqp:tcp:broker1:5672
2009-nov-06 16:02:38 notice Broker running
2009-nov-06 16:02:38 notice Journal "test3": Created
2009-nov-06 16:02:42 notice broker1:10590(READY) caught up, active cluster member
2009-nov-06 16:08:19 error Execution exception: invalid-argument: user1@QPID.6884577f-39ca-46af-8981-ad6043292ba8: confirmed < (8757+0) but only sent < (8756+0) (qpid/SessionState.cpp:163)
2009-nov-06 16:08:19 error broker1:10590(READY/error) channel error 2436085 on broker2:10019-430(shadow): invalid-argument: user1@QPID.6884577f-39ca-46af-8981-ad6043292ba8: confirmed < (8757+0) but only sent < (8756+0) (qpid/SessionState.cpp:163) (unresolved: broker1:10590 broker2:10019 )
2009-nov-06 16:08:19 critical broker1:10590(READY/error) error 2436085 did not occur on broker2:10019
2009-nov-06 16:08:19 error Error delivering frames: Aborted by local failure that did not occur on all replicas
2009-nov-06 16:08:19 notice broker1:10590(LEFT/error) leaving cluster renga-qpid-dev
2009-nov-06 16:08:20 notice Shut down

2009-nov-16 16:15:39 error Execution exception: invalid-argument: user1@QPID.941b62b1-55d9-443a-8ef2-dfc3abf81227: confirmed < (864+0) but only sent < (863+0) (qpid/SessionState.cpp:163)
2009-nov-16 16:15:39 error broker1:31842(READY/error) channel error 4208088 on broker2:31018-476(shadow): invalid-argument: user1@QPID.941b62b1-55d9-443a-8ef2-dfc3abf81227: confirmed < (864+0) but only sent < (863+0) (qpid/SessionState.cpp:163) (unresolved: broker1:31842 broker2:31018 )
2009-nov-16 16:15:39 critical broker1:31842(READY/error) error 4208088 did not occur on broker2:31018
2009-nov-16 16:15:39 error Error delivering frames: Aborted by local failure that did not occur on all replicas
2009-nov-16 16:15:39 notice broker1:31842(LEFT/error) leaving cluster renga-qpid-dev
2009-nov-16 16:15:39 notice Shut down
2009-nov-16 16:15:39 error Connection 10.59.174.186:5672 closed by error: closed by management(320)

Re: Qpid Cluster Errors

Posted by Rajith Attapattu <ra...@gmail.com>.
On Tue, Dec 15, 2009 at 6:40 PM, Sandy Pratt <pr...@adobe.com> wrote:
> Rajith,
>
> I'll see what I can do.  It will take a little bit to make sure there are no internal dependencies (libraries and maven repos are what I can think of now) but it should be doable.
>
> I do have sync_publish='all' set in my connection URL.  I was under the impression that this was a synonym for sycn_ack.  I'll do some testing on that.
no, sync_publish will dictate which transfers (transient, durable or
all) should be done synchronously.
sync_ack makes the client consume each message synchronously.

I'd be interested to know
a) if you could reproduce the issue faster if you used sync_ack
b) Whether sync_publish has any affect at all - (I think it does not).
c) Are you using a message listener or are you using the receive() method?
c) Your actual code, so I can compare it with my test client to see if
there are any differences and experiment based on that.

Btw thanks for your interest on this. Your test case illustrates that
we still have a few corner cases that haven't been identified yet.

> Thanks,
>
> Sandy
>
>> -----Original Message-----
>> From: Rajith Attapattu [mailto:rajith77@gmail.com]
>> Sent: Tuesday, December 15, 2009 3:29 PM
>> To: users@qpid.apache.org
>> Subject: Re: Qpid Cluster Errors
>>
>> Sandy,
>>
>> Interesting test case.
>> I reproduced a similar issue (well atleast the symptoms were the same)
>> but apparently it looks like it's different from your issue.
>> Could you please open a JIRA and submit your test code?
>> I want to see how your test case is different from the one I have.
>> I was able to reproduce my issue reliably and quickly by  using
>> sync_ack=true in the connection URL.
>>
>> I also want to see how I could add your test case to the soak/stress
>> test suite I am building under the testkit module.
>>
>> Regards,
>>
>> Rajith
>>
>> On Tue, Dec 15, 2009 at 4:32 PM, Sandy Pratt <pr...@adobe.com> wrote:
>> >> > -----Original Message-----
>> >> > From: Alan Conway [mailto:aconway@redhat.com]
>> >> >
>> >> > There is another issue that turned up with the same symptom
>> >> > https://issues.apache.org/jira/browse/QPID-225. Fixed in r888874.
>> Let
>> >> > me know if
>> >> > that resolves your problem, if not I want to look into it more.
>> >>
>> >> Thanks Alan.
>> >>
>> >> I don't think that's the issue, because this system should only be
>> >> receiving connections from clients who are using AUTO_ACKNOWLEDGE,
>> but
>> >> I'm happy to try it.
>> >>
>> >> I will let you know how it goes.
>> >
>> > I made a new build of the java client from trunk today, and used that
>> to spam a cluster of two brokers running qpidd-0.5.752581-34 and rhm-
>> 0.5.3206-27 and saw the error again:
>> >
>> > 2009-dec-15 12:10:34 trace 10.59.174.211:19320(READY) DLVR 27677:
>> Frame[BEbe; channel=0; {SessionCompletedBody: commands={ [0,203] }; }]
>> data 10.59.174.186:24880-40
>> > 2009-dec-15 12:10:34 debug prattrs@QPID.79d87e4e-1d7c-4c1c-84f2-
>> 984aeb8c66c2: sender marked completed: { [0,203] }
>> > 2009-dec-15 12:10:34 debug Exception constructed:
>> prattrs@QPID.79d87e4e-1d7c-4c1c-84f2-984aeb8c66c2: confirmed < (204+0)
>> but only sent < (203+0) (qpid/SessionState.cpp:163)
>> > 2009-dec-15 12:10:34 error Execution exception: invalid-argument:
>> prattrs@QPID.79d87e4e-1d7c-4c1c-84f2-984aeb8c66c2: confirmed < (204+0)
>> but only sent < (203+0) (qpid/SessionState.cpp:163)
>> > 2009-dec-15 12:10:34 error 10.59.174.211:19320(READY/error) channel
>> error 27677 on 10.59.174.186:24880-40(shadow): invalid-argument:
>> prattrs@QPID.79d87e4e-1d7c-4c1c-84f2-984aeb8c66c2: confirmed < (204+0)
>> but only sent < (203+0) (qpid/SessionState.cpp:163) (unresolved:
>> 10.59.174.186:24880 10.59.174.211:19320 )
>> > 2009-dec-15 12:10:34 trace MCAST 10.59.174.211:19320-0:
>> {ClusterErrorCheckBody: type=1; frame-seq=27677; }
>> >
>> > OS is an up to date installation of RHEL 5.4 32 bit running on
>> VMWare.
>> >
>> > Qpidd.conf is:
>> >
>> > no-module-dir=true
>> > load-module=/usr/lib/qpid/daemon/cluster.so
>> > load-module=/usr/lib/qpid/daemon/msgstore.so
>> > cluster-mechanism=PLAIN
>> > cluster-username="***"
>> > cluster-password="***"
>> > cluster-name="na1m-dev1"
>> > log-to-file=/var/lib/qpidd/qpidd.log
>> > trace=1
>> > auth=yes
>> > wait=60
>> >
>> > JNDI connection string:
>> >
>> > connectionfactory.qpidConnectionFactory =
>> amqp://***:***@test/?brokerlist='tcp://***.com:443?ssl='true',connectde
>> lay='1000',connecttimeout='5000'',failover='roundrobin?cyclecount='999'
>> ',sync_publish='all'
>> >
>> > The SSL is terminated at the load balancer.
>> >
>> > I'd love to hear the community's ideas on where to look next.  I
>> recently tested out a build of the 0.6beta1 release, and found that the
>> bug (or at least the same symptoms) occurred there as well.  I don't
>> have a good fast way to reproduce the problem.  This particular crash
>> occurred after I spammed the cluster for about 7 minutes, during which
>> time the spammer sent about 20k messages of between 1-1000 KB in size
>> through the cluster, whilst periodically closing and reopening sending
>> and receiving connections.
>> >
>> > Thanks!
>> >
>> > Sandy
>> >
>> > ---------------------------------------------------------------------
>> > Apache Qpid - AMQP Messaging Implementation
>> > Project:      http://qpid.apache.org
>> > Use/Interact: mailto:users-subscribe@qpid.apache.org
>> >
>> >
>>
>>
>>
>> --
>> Regards,
>>
>> Rajith Attapattu
>> Red Hat
>> http://rajith.2rlabs.com/
>>
>> ---------------------------------------------------------------------
>> Apache Qpid - AMQP Messaging Implementation
>> Project:      http://qpid.apache.org
>> Use/Interact: mailto:users-subscribe@qpid.apache.org
>
>
> ---------------------------------------------------------------------
> Apache Qpid - AMQP Messaging Implementation
> Project:      http://qpid.apache.org
> Use/Interact: mailto:users-subscribe@qpid.apache.org
>
>



-- 
Regards,

Rajith Attapattu
Red Hat
http://rajith.2rlabs.com/

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


RE: Qpid Cluster Errors

Posted by Sandy Pratt <pr...@adobe.com>.
That's correct.  It's a cluster of two.  When this error occurs, the replica drops out (LB is configured to serve all connections to the primary unless it's unavailable) and messages continue to flow until the test completes.

> -----Original Message-----
> From: Rajith Attapattu [mailto:rajith77@gmail.com]
> Sent: Thursday, December 17, 2009 3:21 PM
> To: users@qpid.apache.org
> Subject: Re: Qpid Cluster Errors
> 
> Sandy thx, I am very eager to see the code.
> And if I understand in your case the cluster membership remains the
> same until this error is encountered right?
> 
> Rajith
> 
> On Thu, Dec 17, 2009 at 5:40 PM, Sandy Pratt <pr...@adobe.com> wrote:
> > The bug reappeared for me right after I took sync_publish and
> sync_ack off my connection string.
> >
> > I have filed QPID-2295 on this issue.
> >
> > https://issues.apache.org/jira/browse/QPID-2295
> >
> > Thanks for the help.
> >
> >> -----Original Message-----
> >> From: Alan Conway [mailto:aconway@redhat.com]
> >> Sent: Wednesday, December 16, 2009 6:52 AM
> >> To: users@qpid.apache.org
> >> Subject: Re: Qpid Cluster Errors
> >>
> >> On 12/15/2009 01:53 PM, Sandy Pratt wrote:
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: Alan Conway [mailto:aconway@redhat.com]
> >> >>
> >> >> There is another issue that turned up with the same symptom
> >> >> https://issues.apache.org/jira/browse/QPID-225. Fixed in r888874.
> >> Let
> >> >> me know if
> >> >> that resolves your problem, if not I want to look into it more.
> >>
> >> Sorry, typo by me, I meant:  QPID-2253. You'll need to re-build the
> >> broker from
> >> r888874
> >>
> >> If you still see the problem post r888874 can you can you open a
> JIRA
> >> for this?
> >> It'll be easier to track.
> >>
> >> --------------------------------------------------------------------
> -
> >> Apache Qpid - AMQP Messaging Implementation
> >> Project:      http://qpid.apache.org
> >> Use/Interact: mailto:users-subscribe@qpid.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > Apache Qpid - AMQP Messaging Implementation
> > Project:      http://qpid.apache.org
> > Use/Interact: mailto:users-subscribe@qpid.apache.org
> >
> >
> 
> 
> 
> --
> Regards,
> 
> Rajith Attapattu
> Red Hat
> http://rajith.2rlabs.com/
> 
> ---------------------------------------------------------------------
> Apache Qpid - AMQP Messaging Implementation
> Project:      http://qpid.apache.org
> Use/Interact: mailto:users-subscribe@qpid.apache.org


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Qpid Cluster Errors

Posted by Alan Conway <ac...@redhat.com>.
On 12/17/2009 08:55 PM, Sandy Pratt wrote:
> I've attached my driver code to the JIRA.  Feel free to contact me with any questions you might have.

Thanks a lot Sandy, I will be looking into this as soon as I finish up my 
current task.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


RE: Qpid Cluster Errors

Posted by Sandy Pratt <pr...@adobe.com>.
I've attached my driver code to the JIRA.  Feel free to contact me with any questions you might have.

> -----Original Message-----
> From: Rajith Attapattu [mailto:rajith77@gmail.com]
> Sent: Thursday, December 17, 2009 3:21 PM
> To: users@qpid.apache.org
> Subject: Re: Qpid Cluster Errors
> 
> Sandy thx, I am very eager to see the code.
> And if I understand in your case the cluster membership remains the
> same until this error is encountered right?
> 
> Rajith
> 
> On Thu, Dec 17, 2009 at 5:40 PM, Sandy Pratt <pr...@adobe.com> wrote:
> > The bug reappeared for me right after I took sync_publish and
> sync_ack off my connection string.
> >
> > I have filed QPID-2295 on this issue.
> >
> > https://issues.apache.org/jira/browse/QPID-2295
> >
> > Thanks for the help.
> >
> >> -----Original Message-----
> >> From: Alan Conway [mailto:aconway@redhat.com]
> >> Sent: Wednesday, December 16, 2009 6:52 AM
> >> To: users@qpid.apache.org
> >> Subject: Re: Qpid Cluster Errors
> >>
> >> On 12/15/2009 01:53 PM, Sandy Pratt wrote:
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: Alan Conway [mailto:aconway@redhat.com]
> >> >>
> >> >> There is another issue that turned up with the same symptom
> >> >> https://issues.apache.org/jira/browse/QPID-225. Fixed in r888874.
> >> Let
> >> >> me know if
> >> >> that resolves your problem, if not I want to look into it more.
> >>
> >> Sorry, typo by me, I meant:  QPID-2253. You'll need to re-build the
> >> broker from
> >> r888874
> >>
> >> If you still see the problem post r888874 can you can you open a
> JIRA
> >> for this?
> >> It'll be easier to track.
> >>
> >> --------------------------------------------------------------------
> -
> >> Apache Qpid - AMQP Messaging Implementation
> >> Project:      http://qpid.apache.org
> >> Use/Interact: mailto:users-subscribe@qpid.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > Apache Qpid - AMQP Messaging Implementation
> > Project:      http://qpid.apache.org
> > Use/Interact: mailto:users-subscribe@qpid.apache.org
> >
> >
> 
> 
> 
> --
> Regards,
> 
> Rajith Attapattu
> Red Hat
> http://rajith.2rlabs.com/
> 
> ---------------------------------------------------------------------
> Apache Qpid - AMQP Messaging Implementation
> Project:      http://qpid.apache.org
> Use/Interact: mailto:users-subscribe@qpid.apache.org


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Qpid Cluster Errors

Posted by Rajith Attapattu <ra...@gmail.com>.
Sandy thx, I am very eager to see the code.
And if I understand in your case the cluster membership remains the
same until this error is encountered right?

Rajith

On Thu, Dec 17, 2009 at 5:40 PM, Sandy Pratt <pr...@adobe.com> wrote:
> The bug reappeared for me right after I took sync_publish and sync_ack off my connection string.
>
> I have filed QPID-2295 on this issue.
>
> https://issues.apache.org/jira/browse/QPID-2295
>
> Thanks for the help.
>
>> -----Original Message-----
>> From: Alan Conway [mailto:aconway@redhat.com]
>> Sent: Wednesday, December 16, 2009 6:52 AM
>> To: users@qpid.apache.org
>> Subject: Re: Qpid Cluster Errors
>>
>> On 12/15/2009 01:53 PM, Sandy Pratt wrote:
>> >
>> >
>> >> -----Original Message-----
>> >> From: Alan Conway [mailto:aconway@redhat.com]
>> >>
>> >> There is another issue that turned up with the same symptom
>> >> https://issues.apache.org/jira/browse/QPID-225. Fixed in r888874.
>> Let
>> >> me know if
>> >> that resolves your problem, if not I want to look into it more.
>>
>> Sorry, typo by me, I meant:  QPID-2253. You'll need to re-build the
>> broker from
>> r888874
>>
>> If you still see the problem post r888874 can you can you open a JIRA
>> for this?
>> It'll be easier to track.
>>
>> ---------------------------------------------------------------------
>> Apache Qpid - AMQP Messaging Implementation
>> Project:      http://qpid.apache.org
>> Use/Interact: mailto:users-subscribe@qpid.apache.org
>
>
> ---------------------------------------------------------------------
> Apache Qpid - AMQP Messaging Implementation
> Project:      http://qpid.apache.org
> Use/Interact: mailto:users-subscribe@qpid.apache.org
>
>



-- 
Regards,

Rajith Attapattu
Red Hat
http://rajith.2rlabs.com/

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Checking message's userid against authenticated SASL user (was Re: Qpid Cluster Errors)

Posted by Rajith Attapattu <ra...@gmail.com>.
On Thu, Dec 17, 2009 at 6:47 AM, Gordon Sim <gs...@redhat.com> wrote:
> On 12/17/2009 02:37 AM, Sandy Pratt wrote:
>>
>> Thanks Alan.  I made a new build from trunk and things are going well so
>> far, except that when I enable auth=yes, I get the following:
>>
>> 2009-12-16 18:33:18 error qpid/amqp_0_10/SessionHandler.cpp:97: Execution
>> exception: unauthorized-access: authorised user id : prattrs@QPID but user
>> id in message decla
>> red as prattrs (qpid/broker/SemanticState.cpp:435)
>>
>> I'm pretty sure the password is correct, because if I change it I get an
>> entirely different authentication error.
>
> I think this is due to a change in the way the broker checks the userid set
> on a message[1]. For e.g. GSSAPI based authentication it is important that
> this includes the domain[2]

The commit made in rev 819819 does not resolve the issue defined in QPID-2129
The problem described in QPID-2129 is due to the broker not using the
proper domain (even when using kerberos as the auth mechanism).
So the userID maintained in the connection seems to use the default
realm "QPID". Ex "qpidd/srv.ovirt-demo.com@QPID"
The ruby client seems to set the user id properly. Ex
"qpidd/srv.ovirt-demo.com@OVIRT-DEMO.COM"

The commit made in rev 819819 is covered by QPID-2175.

> However for PLAIN authentication I think the realm is added as a 'pseudo'
> domain by cyrus-sasl and I'm not sure the client can in general know
> this(?).
It seems the default realm "QPID" is added irrespective of the auth
mechanism used.
Check the above and QPID-2129.

I don't think the client needs to know the default realm.
And it does not need to supply a domain unless it uses something like
kerberos (QPID-2174)
The broker should add the default domain to the userID supplied in the
message (as it does to a user without a domain when authenticating the
connection) when authorizing a message transfer.

Also you cannot modify the Java client to add a domain by default as
the Java broker has no notion of a domain.
However I plan to add the userID and domain in the JMS client
(ignoring the one set in the connection) if  Kerberos is used as the
auth mechanism.
This will not be a problem as the Java broker currently doesn't
support Kerberos auth. When it does the Java broker would also need to
handle the domain.

> I'm not sure what the best fix is here. Perhaps the checking on the broker
> should just be a little cleverer and should not fail if the client has not
> specified a domain in the userid property of a message and the userid is in
> the 'default' domain (named after the realm)?
>
> Any thoughts?

This is exactly what I have done. Will check in my fix after running the tests.

> I've raised a JIRA for this[3]. As it is a regression for some cases I think
> it is a blocker for 0.6 and have marked it as such.
>
> [1]
> http://svn.apache.org/viewvc/qpid/trunk/qpid/cpp/src/qpid/broker/SemanticState.cpp?r1=819819&r2=819818&pathrev=819819
> [2] https://issues.apache.org/jira/browse/QPID-2129
> [3] https://issues.apache.org/jira/browse/QPID-2290
>
> ---------------------------------------------------------------------
> Apache Qpid - AMQP Messaging Implementation
> Project:      http://qpid.apache.org
> Use/Interact: mailto:users-subscribe@qpid.apache.org
>
>



-- 
Regards,

Rajith Attapattu
Red Hat
http://rajith.2rlabs.com/

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


RE: Qpid Cluster Errors

Posted by Sandy Pratt <pr...@adobe.com>.
Rajith,

I'll see what I can do.  It will take a little bit to make sure there are no internal dependencies (libraries and maven repos are what I can think of now) but it should be doable.

I do have sync_publish='all' set in my connection URL.  I was under the impression that this was a synonym for sycn_ack.  I'll do some testing on that.

Thanks,

Sandy

> -----Original Message-----
> From: Rajith Attapattu [mailto:rajith77@gmail.com]
> Sent: Tuesday, December 15, 2009 3:29 PM
> To: users@qpid.apache.org
> Subject: Re: Qpid Cluster Errors
> 
> Sandy,
> 
> Interesting test case.
> I reproduced a similar issue (well atleast the symptoms were the same)
> but apparently it looks like it's different from your issue.
> Could you please open a JIRA and submit your test code?
> I want to see how your test case is different from the one I have.
> I was able to reproduce my issue reliably and quickly by  using
> sync_ack=true in the connection URL.
> 
> I also want to see how I could add your test case to the soak/stress
> test suite I am building under the testkit module.
> 
> Regards,
> 
> Rajith
> 
> On Tue, Dec 15, 2009 at 4:32 PM, Sandy Pratt <pr...@adobe.com> wrote:
> >> > -----Original Message-----
> >> > From: Alan Conway [mailto:aconway@redhat.com]
> >> >
> >> > There is another issue that turned up with the same symptom
> >> > https://issues.apache.org/jira/browse/QPID-225. Fixed in r888874.
> Let
> >> > me know if
> >> > that resolves your problem, if not I want to look into it more.
> >>
> >> Thanks Alan.
> >>
> >> I don't think that's the issue, because this system should only be
> >> receiving connections from clients who are using AUTO_ACKNOWLEDGE,
> but
> >> I'm happy to try it.
> >>
> >> I will let you know how it goes.
> >
> > I made a new build of the java client from trunk today, and used that
> to spam a cluster of two brokers running qpidd-0.5.752581-34 and rhm-
> 0.5.3206-27 and saw the error again:
> >
> > 2009-dec-15 12:10:34 trace 10.59.174.211:19320(READY) DLVR 27677:
> Frame[BEbe; channel=0; {SessionCompletedBody: commands={ [0,203] }; }]
> data 10.59.174.186:24880-40
> > 2009-dec-15 12:10:34 debug prattrs@QPID.79d87e4e-1d7c-4c1c-84f2-
> 984aeb8c66c2: sender marked completed: { [0,203] }
> > 2009-dec-15 12:10:34 debug Exception constructed:
> prattrs@QPID.79d87e4e-1d7c-4c1c-84f2-984aeb8c66c2: confirmed < (204+0)
> but only sent < (203+0) (qpid/SessionState.cpp:163)
> > 2009-dec-15 12:10:34 error Execution exception: invalid-argument:
> prattrs@QPID.79d87e4e-1d7c-4c1c-84f2-984aeb8c66c2: confirmed < (204+0)
> but only sent < (203+0) (qpid/SessionState.cpp:163)
> > 2009-dec-15 12:10:34 error 10.59.174.211:19320(READY/error) channel
> error 27677 on 10.59.174.186:24880-40(shadow): invalid-argument:
> prattrs@QPID.79d87e4e-1d7c-4c1c-84f2-984aeb8c66c2: confirmed < (204+0)
> but only sent < (203+0) (qpid/SessionState.cpp:163) (unresolved:
> 10.59.174.186:24880 10.59.174.211:19320 )
> > 2009-dec-15 12:10:34 trace MCAST 10.59.174.211:19320-0:
> {ClusterErrorCheckBody: type=1; frame-seq=27677; }
> >
> > OS is an up to date installation of RHEL 5.4 32 bit running on
> VMWare.
> >
> > Qpidd.conf is:
> >
> > no-module-dir=true
> > load-module=/usr/lib/qpid/daemon/cluster.so
> > load-module=/usr/lib/qpid/daemon/msgstore.so
> > cluster-mechanism=PLAIN
> > cluster-username="***"
> > cluster-password="***"
> > cluster-name="na1m-dev1"
> > log-to-file=/var/lib/qpidd/qpidd.log
> > trace=1
> > auth=yes
> > wait=60
> >
> > JNDI connection string:
> >
> > connectionfactory.qpidConnectionFactory =
> amqp://***:***@test/?brokerlist='tcp://***.com:443?ssl='true',connectde
> lay='1000',connecttimeout='5000'',failover='roundrobin?cyclecount='999'
> ',sync_publish='all'
> >
> > The SSL is terminated at the load balancer.
> >
> > I'd love to hear the community's ideas on where to look next.  I
> recently tested out a build of the 0.6beta1 release, and found that the
> bug (or at least the same symptoms) occurred there as well.  I don't
> have a good fast way to reproduce the problem.  This particular crash
> occurred after I spammed the cluster for about 7 minutes, during which
> time the spammer sent about 20k messages of between 1-1000 KB in size
> through the cluster, whilst periodically closing and reopening sending
> and receiving connections.
> >
> > Thanks!
> >
> > Sandy
> >
> > ---------------------------------------------------------------------
> > Apache Qpid - AMQP Messaging Implementation
> > Project:      http://qpid.apache.org
> > Use/Interact: mailto:users-subscribe@qpid.apache.org
> >
> >
> 
> 
> 
> --
> Regards,
> 
> Rajith Attapattu
> Red Hat
> http://rajith.2rlabs.com/
> 
> ---------------------------------------------------------------------
> Apache Qpid - AMQP Messaging Implementation
> Project:      http://qpid.apache.org
> Use/Interact: mailto:users-subscribe@qpid.apache.org


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Qpid Cluster Errors

Posted by Rajith Attapattu <ra...@gmail.com>.
Sandy,

Interesting test case.
I reproduced a similar issue (well atleast the symptoms were the same)
but apparently it looks like it's different from your issue.
Could you please open a JIRA and submit your test code?
I want to see how your test case is different from the one I have.
I was able to reproduce my issue reliably and quickly by  using
sync_ack=true in the connection URL.

I also want to see how I could add your test case to the soak/stress
test suite I am building under the testkit module.

Regards,

Rajith

On Tue, Dec 15, 2009 at 4:32 PM, Sandy Pratt <pr...@adobe.com> wrote:
>> > -----Original Message-----
>> > From: Alan Conway [mailto:aconway@redhat.com]
>> >
>> > There is another issue that turned up with the same symptom
>> > https://issues.apache.org/jira/browse/QPID-225. Fixed in r888874. Let
>> > me know if
>> > that resolves your problem, if not I want to look into it more.
>>
>> Thanks Alan.
>>
>> I don't think that's the issue, because this system should only be
>> receiving connections from clients who are using AUTO_ACKNOWLEDGE, but
>> I'm happy to try it.
>>
>> I will let you know how it goes.
>
> I made a new build of the java client from trunk today, and used that to spam a cluster of two brokers running qpidd-0.5.752581-34 and rhm-0.5.3206-27 and saw the error again:
>
> 2009-dec-15 12:10:34 trace 10.59.174.211:19320(READY) DLVR 27677: Frame[BEbe; channel=0; {SessionCompletedBody: commands={ [0,203] }; }] data 10.59.174.186:24880-40
> 2009-dec-15 12:10:34 debug prattrs@QPID.79d87e4e-1d7c-4c1c-84f2-984aeb8c66c2: sender marked completed: { [0,203] }
> 2009-dec-15 12:10:34 debug Exception constructed: prattrs@QPID.79d87e4e-1d7c-4c1c-84f2-984aeb8c66c2: confirmed < (204+0) but only sent < (203+0) (qpid/SessionState.cpp:163)
> 2009-dec-15 12:10:34 error Execution exception: invalid-argument: prattrs@QPID.79d87e4e-1d7c-4c1c-84f2-984aeb8c66c2: confirmed < (204+0) but only sent < (203+0) (qpid/SessionState.cpp:163)
> 2009-dec-15 12:10:34 error 10.59.174.211:19320(READY/error) channel error 27677 on 10.59.174.186:24880-40(shadow): invalid-argument: prattrs@QPID.79d87e4e-1d7c-4c1c-84f2-984aeb8c66c2: confirmed < (204+0) but only sent < (203+0) (qpid/SessionState.cpp:163) (unresolved: 10.59.174.186:24880 10.59.174.211:19320 )
> 2009-dec-15 12:10:34 trace MCAST 10.59.174.211:19320-0: {ClusterErrorCheckBody: type=1; frame-seq=27677; }
>
> OS is an up to date installation of RHEL 5.4 32 bit running on VMWare.
>
> Qpidd.conf is:
>
> no-module-dir=true
> load-module=/usr/lib/qpid/daemon/cluster.so
> load-module=/usr/lib/qpid/daemon/msgstore.so
> cluster-mechanism=PLAIN
> cluster-username="***"
> cluster-password="***"
> cluster-name="na1m-dev1"
> log-to-file=/var/lib/qpidd/qpidd.log
> trace=1
> auth=yes
> wait=60
>
> JNDI connection string:
>
> connectionfactory.qpidConnectionFactory = amqp://***:***@test/?brokerlist='tcp://***.com:443?ssl='true',connectdelay='1000',connecttimeout='5000'',failover='roundrobin?cyclecount='999'',sync_publish='all'
>
> The SSL is terminated at the load balancer.
>
> I'd love to hear the community's ideas on where to look next.  I recently tested out a build of the 0.6beta1 release, and found that the bug (or at least the same symptoms) occurred there as well.  I don't have a good fast way to reproduce the problem.  This particular crash occurred after I spammed the cluster for about 7 minutes, during which time the spammer sent about 20k messages of between 1-1000 KB in size through the cluster, whilst periodically closing and reopening sending and receiving connections.
>
> Thanks!
>
> Sandy
>
> ---------------------------------------------------------------------
> Apache Qpid - AMQP Messaging Implementation
> Project:      http://qpid.apache.org
> Use/Interact: mailto:users-subscribe@qpid.apache.org
>
>



-- 
Regards,

Rajith Attapattu
Red Hat
http://rajith.2rlabs.com/

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Checking message's userid against authenticated SASL user (was Re: Qpid Cluster Errors)

Posted by Rajith Attapattu <ra...@gmail.com>.
On Thu, Dec 17, 2009 at 6:47 AM, Gordon Sim <gs...@redhat.com> wrote:
> On 12/17/2009 02:37 AM, Sandy Pratt wrote:
>>
>> Thanks Alan.  I made a new build from trunk and things are going well so
>> far, except that when I enable auth=yes, I get the following:
>>
>> 2009-12-16 18:33:18 error qpid/amqp_0_10/SessionHandler.cpp:97: Execution
>> exception: unauthorized-access: authorised user id : prattrs@QPID but user
>> id in message decla
>> red as prattrs (qpid/broker/SemanticState.cpp:435)
>>
>> I'm pretty sure the password is correct, because if I change it I get an
>> entirely different authentication error.
>
> I think this is due to a change in the way the broker checks the userid set
> on a message[1]. For e.g. GSSAPI based authentication it is important that
> this includes the domain[2]

The commit made in rev 819819 does not resolve the issue defined in QPID-2129
The problem described in QPID-2129 is due to the broker not using the
proper domain (even when using kerberos as the auth mechanism).
So the userID maintained in the connection seems to use the default
realm "QPID". Ex "qpidd/srv.ovirt-demo.com@QPID"
The ruby client seems to set the user id properly. Ex
"qpidd/srv.ovirt-demo.com@OVIRT-DEMO.COM"

The commit made in rev 819819 is covered by QPID-2175.

> However for PLAIN authentication I think the realm is added as a 'pseudo'
> domain by cyrus-sasl and I'm not sure the client can in general know
> this(?).
It seems the default realm "QPID" is added irrespective of the auth
mechanism used.
Check the above and QPID-2129.

I don't think the client needs to know the default realm.
And it does not need to supply a domain unless it uses something like
kerberos (QPID-2174)
The broker should add the default domain to the userID supplied in the
message (as it does to a user without a domain when authenticating the
connection) when authorizing a message transfer.

Also you cannot modify the Java client to add a domain by default as
the Java broker has no notion of a domain.
However I plan to add the userID and domain in the JMS client
(ignoring the one set in the connection) if  Kerberos is used as the
auth mechanism.
This will not be a problem as the Java broker currently doesn't
support Kerberos auth. When it does the Java broker would also need to
handle the domain.

> I'm not sure what the best fix is here. Perhaps the checking on the broker
> should just be a little cleverer and should not fail if the client has not
> specified a domain in the userid property of a message and the userid is in
> the 'default' domain (named after the realm)?
>
> Any thoughts?

This is exactly what I have done. Will check in my fix after running the tests.

> I've raised a JIRA for this[3]. As it is a regression for some cases I think
> it is a blocker for 0.6 and have marked it as such.
>
> [1]
> http://svn.apache.org/viewvc/qpid/trunk/qpid/cpp/src/qpid/broker/SemanticState.cpp?r1=819819&r2=819818&pathrev=819819
> [2] https://issues.apache.org/jira/browse/QPID-2129
> [3] https://issues.apache.org/jira/browse/QPID-2290
>
> ---------------------------------------------------------------------
> Apache Qpid - AMQP Messaging Implementation
> Project:      http://qpid.apache.org
> Use/Interact: mailto:users-subscribe@qpid.apache.org
>
>



-- 
Regards,

Rajith Attapattu
Red Hat
http://rajith.2rlabs.com/

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Qpid Cluster Errors

Posted by Carl Trieloff <cc...@redhat.com>.
Sandy Pratt wrote:
>   
>> -----Original Message-----
>> From: Carl Trieloff [mailto:cctrieloff@redhat.com]
>> Sent: Tuesday, December 15, 2009 1:49 PM
>> To: users@qpid.apache.org
>> Subject: Re: Qpid Cluster Errors
>>
>> Sandy,
>>
>> Sorry to ask the basic questions, but are you uptodate on AIS, or if
>> you
>> let us know what AIS rpm version you have installed?
>>     
>
>
> No worries, and thanks for the help.  Here you are:
>
> [prattrs@hsvrhm5 qpidd]$ rpm -qi openais
> Name        : openais                      Relocations: (not relocatable)
> Version     : 0.80.6                            Vendor: Red Hat, Inc.
> Release     : 8.el5_4.1                     Build Date: Fri 04 Sep 2009 02:11:24 PM PDT
> Install Date: Fri 04 Dec 2009 12:14:41 AM PST      Build Host: hs20-bc1-5.build.redhat.com
> Group       : System Environment/Base       Source RPM: openais-0.80.6-8.el5_4.1.src.rpm
> Size        : 909701                           License: BSD
> Signature   : DSA/SHA1, Thu 01 Oct 2009 12:01:50 PM PDT, Key ID 5326810137017186
> Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
> URL         : http://developer.osdl.org/dev/openais/
> Summary     : The openais Standards-Based Cluster Framework executive and APIs
> Description :
> This package contains the openais executive, openais service handlers,
> default configuration files and init script.
>
> And here is the log tail:
>
> [prattrs@hsvrhm6 qpidd]$ tail -f /var/log/openais.log
> Dec 15 12:11:05.745810 [CLM  ]  r(0) ip(10.59.174.211)
> Dec 15 12:11:05.745818 [CLM  ] Members Left:
> Dec 15 12:11:05.745826 [CLM  ] Members Joined:
> Dec 15 12:11:05.745835 [CLM  ]  r(0) ip(10.59.174.186)
> Dec 15 12:11:05.745847 [SYNC ] This node is within the primary component and will provide service.
> Dec 15 12:11:05.745864 [TOTEM] entering OPERATIONAL state.
> Dec 15 12:11:05.748010 [CLM  ] got nodejoin message 10.59.174.186
> Dec 15 12:11:05.748134 [CLM  ] got nodejoin message 10.59.174.211
> Dec 15 12:11:05.771582 [CPG  ] got joinlist message from node -1162986742
> Dec 15 13:45:00.577005 [ipc.c:0783] connection received from libais client 5.
>
> Thanks,
>
>   

That looks correct, I think we need thoughts from Alan

Carl.

RE: Qpid Cluster Errors

Posted by Sandy Pratt <pr...@adobe.com>.

> -----Original Message-----
> From: Carl Trieloff [mailto:cctrieloff@redhat.com]
> Sent: Tuesday, December 15, 2009 1:49 PM
> To: users@qpid.apache.org
> Subject: Re: Qpid Cluster Errors
> 
> Sandy,
> 
> Sorry to ask the basic questions, but are you uptodate on AIS, or if
> you
> let us know what AIS rpm version you have installed?


No worries, and thanks for the help.  Here you are:

[prattrs@hsvrhm5 qpidd]$ rpm -qi openais
Name        : openais                      Relocations: (not relocatable)
Version     : 0.80.6                            Vendor: Red Hat, Inc.
Release     : 8.el5_4.1                     Build Date: Fri 04 Sep 2009 02:11:24 PM PDT
Install Date: Fri 04 Dec 2009 12:14:41 AM PST      Build Host: hs20-bc1-5.build.redhat.com
Group       : System Environment/Base       Source RPM: openais-0.80.6-8.el5_4.1.src.rpm
Size        : 909701                           License: BSD
Signature   : DSA/SHA1, Thu 01 Oct 2009 12:01:50 PM PDT, Key ID 5326810137017186
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
URL         : http://developer.osdl.org/dev/openais/
Summary     : The openais Standards-Based Cluster Framework executive and APIs
Description :
This package contains the openais executive, openais service handlers,
default configuration files and init script.

And here is the log tail:

[prattrs@hsvrhm6 qpidd]$ tail -f /var/log/openais.log
Dec 15 12:11:05.745810 [CLM  ]  r(0) ip(10.59.174.211)
Dec 15 12:11:05.745818 [CLM  ] Members Left:
Dec 15 12:11:05.745826 [CLM  ] Members Joined:
Dec 15 12:11:05.745835 [CLM  ]  r(0) ip(10.59.174.186)
Dec 15 12:11:05.745847 [SYNC ] This node is within the primary component and will provide service.
Dec 15 12:11:05.745864 [TOTEM] entering OPERATIONAL state.
Dec 15 12:11:05.748010 [CLM  ] got nodejoin message 10.59.174.186
Dec 15 12:11:05.748134 [CLM  ] got nodejoin message 10.59.174.211
Dec 15 12:11:05.771582 [CPG  ] got joinlist message from node -1162986742
Dec 15 13:45:00.577005 [ipc.c:0783] connection received from libais client 5.

Thanks,

Sandy

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Qpid Cluster Errors

Posted by Carl Trieloff <cc...@redhat.com>.
Sandy Pratt wrote:
>>> -----Original Message-----
>>> From: Alan Conway [mailto:aconway@redhat.com]
>>>
>>> There is another issue that turned up with the same symptom
>>> https://issues.apache.org/jira/browse/QPID-225. Fixed in r888874. Let
>>> me know if
>>> that resolves your problem, if not I want to look into it more.
>>>       
>> Thanks Alan.
>>
>> I don't think that's the issue, because this system should only be
>> receiving connections from clients who are using AUTO_ACKNOWLEDGE, but
>> I'm happy to try it.
>>
>> I will let you know how it goes.
>>     
>
> I made a new build of the java client from trunk today, and used that to spam a cluster of two brokers running qpidd-0.5.752581-34 and rhm-0.5.3206-27 and saw the error again:
>
> 2009-dec-15 12:10:34 trace 10.59.174.211:19320(READY) DLVR 27677: Frame[BEbe; channel=0; {SessionCompletedBody: commands={ [0,203] }; }] data 10.59.174.186:24880-40
> 2009-dec-15 12:10:34 debug prattrs@QPID.79d87e4e-1d7c-4c1c-84f2-984aeb8c66c2: sender marked completed: { [0,203] }
> 2009-dec-15 12:10:34 debug Exception constructed: prattrs@QPID.79d87e4e-1d7c-4c1c-84f2-984aeb8c66c2: confirmed < (204+0) but only sent < (203+0) (qpid/SessionState.cpp:163)
> 2009-dec-15 12:10:34 error Execution exception: invalid-argument: prattrs@QPID.79d87e4e-1d7c-4c1c-84f2-984aeb8c66c2: confirmed < (204+0) but only sent < (203+0) (qpid/SessionState.cpp:163)
> 2009-dec-15 12:10:34 error 10.59.174.211:19320(READY/error) channel error 27677 on 10.59.174.186:24880-40(shadow): invalid-argument: prattrs@QPID.79d87e4e-1d7c-4c1c-84f2-984aeb8c66c2: confirmed < (204+0) but only sent < (203+0) (qpid/SessionState.cpp:163) (unresolved: 10.59.174.186:24880 10.59.174.211:19320 )
> 2009-dec-15 12:10:34 trace MCAST 10.59.174.211:19320-0: {ClusterErrorCheckBody: type=1; frame-seq=27677; }
>
> OS is an up to date installation of RHEL 5.4 32 bit running on VMWare.
>
> Qpidd.conf is:
>
> no-module-dir=true
> load-module=/usr/lib/qpid/daemon/cluster.so
> load-module=/usr/lib/qpid/daemon/msgstore.so
> cluster-mechanism=PLAIN
> cluster-username="***"
> cluster-password="***"
> cluster-name="na1m-dev1"
> log-to-file=/var/lib/qpidd/qpidd.log
> trace=1
> auth=yes
> wait=60
>
> JNDI connection string:
>
> connectionfactory.qpidConnectionFactory = amqp://***:***@test/?brokerlist='tcp://***.com:443?ssl='true',connectdelay='1000',connecttimeout='5000'',failover='roundrobin?cyclecount='999'',sync_publish='all'
>
> The SSL is terminated at the load balancer.
>
> I'd love to hear the community's ideas on where to look next.  I recently tested out a build of the 0.6beta1 release, and found that the bug (or at least the same symptoms) occurred there as well.  I don't have a good fast way to reproduce the problem.  This particular crash occurred after I spammed the cluster for about 7 minutes, during which time the spammer sent about 20k messages of between 1-1000 KB in size through the cluster, whilst periodically closing and reopening sending and receiving connections.
>
>   

Sandy,

Sorry to ask the basic questions, but are you uptodate on AIS, or if you 
let us know what AIS rpm version you have installed?

Carl.



RE: Qpid Cluster Errors

Posted by Sandy Pratt <pr...@adobe.com>.
> > -----Original Message-----
> > From: Alan Conway [mailto:aconway@redhat.com]
> >
> > There is another issue that turned up with the same symptom
> > https://issues.apache.org/jira/browse/QPID-225. Fixed in r888874. Let
> > me know if
> > that resolves your problem, if not I want to look into it more.
> 
> Thanks Alan.
> 
> I don't think that's the issue, because this system should only be
> receiving connections from clients who are using AUTO_ACKNOWLEDGE, but
> I'm happy to try it.
> 
> I will let you know how it goes.

I made a new build of the java client from trunk today, and used that to spam a cluster of two brokers running qpidd-0.5.752581-34 and rhm-0.5.3206-27 and saw the error again:

2009-dec-15 12:10:34 trace 10.59.174.211:19320(READY) DLVR 27677: Frame[BEbe; channel=0; {SessionCompletedBody: commands={ [0,203] }; }] data 10.59.174.186:24880-40
2009-dec-15 12:10:34 debug prattrs@QPID.79d87e4e-1d7c-4c1c-84f2-984aeb8c66c2: sender marked completed: { [0,203] }
2009-dec-15 12:10:34 debug Exception constructed: prattrs@QPID.79d87e4e-1d7c-4c1c-84f2-984aeb8c66c2: confirmed < (204+0) but only sent < (203+0) (qpid/SessionState.cpp:163)
2009-dec-15 12:10:34 error Execution exception: invalid-argument: prattrs@QPID.79d87e4e-1d7c-4c1c-84f2-984aeb8c66c2: confirmed < (204+0) but only sent < (203+0) (qpid/SessionState.cpp:163)
2009-dec-15 12:10:34 error 10.59.174.211:19320(READY/error) channel error 27677 on 10.59.174.186:24880-40(shadow): invalid-argument: prattrs@QPID.79d87e4e-1d7c-4c1c-84f2-984aeb8c66c2: confirmed < (204+0) but only sent < (203+0) (qpid/SessionState.cpp:163) (unresolved: 10.59.174.186:24880 10.59.174.211:19320 )
2009-dec-15 12:10:34 trace MCAST 10.59.174.211:19320-0: {ClusterErrorCheckBody: type=1; frame-seq=27677; }

OS is an up to date installation of RHEL 5.4 32 bit running on VMWare.

Qpidd.conf is:

no-module-dir=true
load-module=/usr/lib/qpid/daemon/cluster.so
load-module=/usr/lib/qpid/daemon/msgstore.so
cluster-mechanism=PLAIN
cluster-username="***"
cluster-password="***"
cluster-name="na1m-dev1"
log-to-file=/var/lib/qpidd/qpidd.log
trace=1
auth=yes
wait=60

JNDI connection string:

connectionfactory.qpidConnectionFactory = amqp://***:***@test/?brokerlist='tcp://***.com:443?ssl='true',connectdelay='1000',connecttimeout='5000'',failover='roundrobin?cyclecount='999'',sync_publish='all'

The SSL is terminated at the load balancer.

I'd love to hear the community's ideas on where to look next.  I recently tested out a build of the 0.6beta1 release, and found that the bug (or at least the same symptoms) occurred there as well.  I don't have a good fast way to reproduce the problem.  This particular crash occurred after I spammed the cluster for about 7 minutes, during which time the spammer sent about 20k messages of between 1-1000 KB in size through the cluster, whilst periodically closing and reopening sending and receiving connections.

Thanks!

Sandy

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


RE: Qpid Cluster Errors

Posted by Sandy Pratt <pr...@adobe.com>.
The bug reappeared for me right after I took sync_publish and sync_ack off my connection string.

I have filed QPID-2295 on this issue.

https://issues.apache.org/jira/browse/QPID-2295

Thanks for the help.

> -----Original Message-----
> From: Alan Conway [mailto:aconway@redhat.com]
> Sent: Wednesday, December 16, 2009 6:52 AM
> To: users@qpid.apache.org
> Subject: Re: Qpid Cluster Errors
> 
> On 12/15/2009 01:53 PM, Sandy Pratt wrote:
> >
> >
> >> -----Original Message-----
> >> From: Alan Conway [mailto:aconway@redhat.com]
> >>
> >> There is another issue that turned up with the same symptom
> >> https://issues.apache.org/jira/browse/QPID-225. Fixed in r888874.
> Let
> >> me know if
> >> that resolves your problem, if not I want to look into it more.
> 
> Sorry, typo by me, I meant:  QPID-2253. You'll need to re-build the
> broker from
> r888874
> 
> If you still see the problem post r888874 can you can you open a JIRA
> for this?
> It'll be easier to track.
> 
> ---------------------------------------------------------------------
> Apache Qpid - AMQP Messaging Implementation
> Project:      http://qpid.apache.org
> Use/Interact: mailto:users-subscribe@qpid.apache.org


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Checking message's userid against authenticated SASL user (was Re: Qpid Cluster Errors)

Posted by Gordon Sim <gs...@redhat.com>.
On 12/17/2009 02:37 AM, Sandy Pratt wrote:
> Thanks Alan.  I made a new build from trunk and things are going well so far, except that when I enable auth=yes, I get the following:
>
> 2009-12-16 18:33:18 error qpid/amqp_0_10/SessionHandler.cpp:97: Execution exception: unauthorized-access: authorised user id : prattrs@QPID but user id in message decla
> red as prattrs (qpid/broker/SemanticState.cpp:435)
>
> I'm pretty sure the password is correct, because if I change it I get an entirely different authentication error.

I think this is due to a change in the way the broker checks the userid 
set on a message[1]. For e.g. GSSAPI based authentication it is 
important that this includes the domain[2]

However for PLAIN authentication I think the realm is added as a 
'pseudo' domain by cyrus-sasl and I'm not sure the client can in general 
know this(?).

I'm not sure what the best fix is here. Perhaps the checking on the 
broker should just be a little cleverer and should not fail if the 
client has not specified a domain in the userid property of a message 
and the userid is in the 'default' domain (named after the realm)?

Any thoughts?

I've raised a JIRA for this[3]. As it is a regression for some cases I 
think it is a blocker for 0.6 and have marked it as such.

[1] 
http://svn.apache.org/viewvc/qpid/trunk/qpid/cpp/src/qpid/broker/SemanticState.cpp?r1=819819&r2=819818&pathrev=819819
[2] https://issues.apache.org/jira/browse/QPID-2129
[3] https://issues.apache.org/jira/browse/QPID-2290

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


RE: Qpid Cluster Errors

Posted by Sandy Pratt <pr...@adobe.com>.
Thanks Alan.  I made a new build from trunk and things are going well so far, except that when I enable auth=yes, I get the following:

2009-12-16 18:33:18 error qpid/amqp_0_10/SessionHandler.cpp:97: Execution exception: unauthorized-access: authorised user id : prattrs@QPID but user id in message decla
red as prattrs (qpid/broker/SemanticState.cpp:435)

I'm pretty sure the password is correct, because if I change it I get an entirely different authentication error.

Thanks,

Sandy

> -----Original Message-----
> From: Alan Conway [mailto:aconway@redhat.com]
> Sent: Wednesday, December 16, 2009 6:52 AM
> To: users@qpid.apache.org
> Subject: Re: Qpid Cluster Errors
> 
> On 12/15/2009 01:53 PM, Sandy Pratt wrote:
> >
> >
> >> -----Original Message-----
> >> From: Alan Conway [mailto:aconway@redhat.com]
> >>
> >> There is another issue that turned up with the same symptom
> >> https://issues.apache.org/jira/browse/QPID-225. Fixed in r888874.
> Let
> >> me know if
> >> that resolves your problem, if not I want to look into it more.
> 
> Sorry, typo by me, I meant:  QPID-2253. You'll need to re-build the
> broker from
> r888874
> 
> If you still see the problem post r888874 can you can you open a JIRA
> for this?
> It'll be easier to track.
> 
> ---------------------------------------------------------------------
> Apache Qpid - AMQP Messaging Implementation
> Project:      http://qpid.apache.org
> Use/Interact: mailto:users-subscribe@qpid.apache.org


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Qpid Cluster Errors

Posted by Alan Conway <ac...@redhat.com>.
On 12/15/2009 01:53 PM, Sandy Pratt wrote:
>
>
>> -----Original Message-----
>> From: Alan Conway [mailto:aconway@redhat.com]
>>
>> There is another issue that turned up with the same symptom
>> https://issues.apache.org/jira/browse/QPID-225. Fixed in r888874. Let
>> me know if
>> that resolves your problem, if not I want to look into it more.

Sorry, typo by me, I meant:  QPID-2253. You'll need to re-build the broker from 
r888874

If you still see the problem post r888874 can you can you open a JIRA for this? 
It'll be easier to track.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


RE: Qpid Cluster Errors

Posted by Sandy Pratt <pr...@adobe.com>.

> -----Original Message-----
> From: Alan Conway [mailto:aconway@redhat.com]
> 
> There is another issue that turned up with the same symptom
> https://issues.apache.org/jira/browse/QPID-225. Fixed in r888874. Let
> me know if
> that resolves your problem, if not I want to look into it more.

Thanks Alan.

I don't think that's the issue, because this system should only be receiving connections from clients who are using AUTO_ACKNOWLEDGE, but I'm happy to try it.

I will let you know how it goes.

Sandy

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Qpid Cluster Errors

Posted by Alan Conway <ac...@redhat.com>.
On 12/08/2009 02:26 PM, Sandy Pratt wrote:
>
>> -----Original Message-----
>> From: Alan Conway [mailto:aconway@redhat.com]
>> Sent: Friday, November 27, 2009 7:52 AM
>> To: users@qpid.apache.org
>> Subject: Re: Qpid Cluster Errors
>>
>> I think this is an instance of one of the following two bugs, probably
>> the first
>> one:
>>
>> (1) https://bugzilla.redhat.com/show_bug.cgi?id=516501
>> (2) http://svn.apache.org/viewvc?view=revision&revision=834026
>>
>> Both are fixed for the next release.
>
>
> I upgraded my machines to qpidd 0.5.752581-34 built from qpidc-0.5.752581-34.src.rpm, and I was actually able to see this problem recur on at least two of my clusters.  Is it true that these builds contain the bug fix?  If so, then I must be suffering from some other problem.
>
> Anyone have any suggestions for mitigating the problem, or a good revision to build from trunk at?
>

There is another issue that turned up with the same symptom 
https://issues.apache.org/jira/browse/QPID-225. Fixed in r888874. Let me know if 
that resolves your problem, if not I want to look into it more.

> Debug data below.
>
> Thanks,
>
> Sandy
>
>
> --
>
> Qpidd.conf:
>
> cluster-name="na2m-dev1"
> cluster-username="***"
> cluster-password="***"
> cluster-mechanism="PLAIN"
> cluster-read-max=1
> log-to-file="/var/lib/qpidd/qpidd.log"
> auto-expand-max-jfiles=64
> num-jfiles=16
> jfile-size-pgs=512
> wcache-page-size=128
> tpl-num-jfiles=32
> tpl-jfile-size-pgs=48
> tpl-wcache-page-size=64
> wait=120
> ack=1
> worker-threads=1
>
> --
>
> Logs:
>
> Cluster 1:
>
> 2009-dec-07 15:11:47 error Execution exception: invalid-argument: na2mdev1@QPID.ffdd4af6-e864-4368-b8fb-86a35384a75f: confirmed<  (5+0) but only sent<  (4+0) (qpid/SessionState.cpp:163)
> 2009-dec-07 15:11:47 error 10.72.40.167:13351(READY/error) channel error 293369 on 10.72.40.166:31860-576(shadow): invalid-argument: na2mdev1@QPID.ffdd4af6-e864-4368-b8fb-86a35384a75f: confirmed<  (5+0) but only sent<  (4+0) (qpid/SessionState.cpp:163) (unresolved: 10.72.40.166:31860 10.72.40.167:13351 )
> 2009-dec-07 15:11:47 critical 10.72.40.167:13351(READY/error) error 293369 did not occur on 10.72.40.166:31860
> 2009-dec-07 15:11:47 error Error delivering frames: Error 293369 did not occur on all members (qpid/cluster/ErrorCheck.cpp:90)
> 2009-dec-07 15:11:47 notice 10.72.40.167:13351(LEFT/error) leaving cluster na2m-dev1
> 2009-dec-07 15:11:47 notice Shut down
>
> Cluster 2:
>
> 2009-dec-07 16:09:01 error Execution exception: invalid-argument: guest@QPID.d20f3b12-ebae-4afb-996e-7a80c1b265fe: confirmed<  (1+0) but only sent<  (0+0) (qpid/SessionState.cpp:163)
> 2009-dec-07 16:09:01 error 10.59.174.211:10625(READY/error) channel error 606 on 10.59.174.186:3014-3(shadow): invalid-argument: guest@QPID.d20f3b12-ebae-4afb-996e-7a80c1b265fe: confirmed<  (1+0) but only sent<  (0+0) (qpid/SessionState.cpp:163) (unresolved: 10.59.174.186:3014 10.59.174.211:10625 )
> 2009-dec-07 16:09:01 critical 10.59.174.211:10625(READY/error) error 606 did not occur on 10.59.174.186:3014
> 2009-dec-07 16:09:01 error Error delivering frames: Error 606 did not occur on all members (qpid/cluster/ErrorCheck.cpp:90)
> 2009-dec-07 16:09:01 notice 10.59.174.211:10625(LEFT/error) leaving cluster renga
> 2009-dec-07 16:09:01 notice Shut down
>
>
>
> ---------------------------------------------------------------------
> Apache Qpid - AMQP Messaging Implementation
> Project:      http://qpid.apache.org
> Use/Interact: mailto:users-subscribe@qpid.apache.org
>

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


RE: Qpid Cluster Errors

Posted by Sandy Pratt <pr...@adobe.com>.
> -----Original Message-----
> From: Alan Conway [mailto:aconway@redhat.com]
> Sent: Friday, November 27, 2009 7:52 AM
> To: users@qpid.apache.org
> Subject: Re: Qpid Cluster Errors
> 
> I think this is an instance of one of the following two bugs, probably
> the first
> one:
> 
> (1) https://bugzilla.redhat.com/show_bug.cgi?id=516501
> (2) http://svn.apache.org/viewvc?view=revision&revision=834026
> 
> Both are fixed for the next release.


I upgraded my machines to qpidd 0.5.752581-34 built from qpidc-0.5.752581-34.src.rpm, and I was actually able to see this problem recur on at least two of my clusters.  Is it true that these builds contain the bug fix?  If so, then I must be suffering from some other problem.

Anyone have any suggestions for mitigating the problem, or a good revision to build from trunk at?

Debug data below.

Thanks,

Sandy


--

Qpidd.conf:

cluster-name="na2m-dev1"
cluster-username="***"
cluster-password="***"
cluster-mechanism="PLAIN"
cluster-read-max=1
log-to-file="/var/lib/qpidd/qpidd.log"
auto-expand-max-jfiles=64
num-jfiles=16
jfile-size-pgs=512
wcache-page-size=128
tpl-num-jfiles=32
tpl-jfile-size-pgs=48
tpl-wcache-page-size=64
wait=120
ack=1
worker-threads=1

--

Logs:

Cluster 1:

2009-dec-07 15:11:47 error Execution exception: invalid-argument: na2mdev1@QPID.ffdd4af6-e864-4368-b8fb-86a35384a75f: confirmed < (5+0) but only sent < (4+0) (qpid/SessionState.cpp:163)
2009-dec-07 15:11:47 error 10.72.40.167:13351(READY/error) channel error 293369 on 10.72.40.166:31860-576(shadow): invalid-argument: na2mdev1@QPID.ffdd4af6-e864-4368-b8fb-86a35384a75f: confirmed < (5+0) but only sent < (4+0) (qpid/SessionState.cpp:163) (unresolved: 10.72.40.166:31860 10.72.40.167:13351 )
2009-dec-07 15:11:47 critical 10.72.40.167:13351(READY/error) error 293369 did not occur on 10.72.40.166:31860
2009-dec-07 15:11:47 error Error delivering frames: Error 293369 did not occur on all members (qpid/cluster/ErrorCheck.cpp:90)
2009-dec-07 15:11:47 notice 10.72.40.167:13351(LEFT/error) leaving cluster na2m-dev1
2009-dec-07 15:11:47 notice Shut down

Cluster 2:

2009-dec-07 16:09:01 error Execution exception: invalid-argument: guest@QPID.d20f3b12-ebae-4afb-996e-7a80c1b265fe: confirmed < (1+0) but only sent < (0+0) (qpid/SessionState.cpp:163)
2009-dec-07 16:09:01 error 10.59.174.211:10625(READY/error) channel error 606 on 10.59.174.186:3014-3(shadow): invalid-argument: guest@QPID.d20f3b12-ebae-4afb-996e-7a80c1b265fe: confirmed < (1+0) but only sent < (0+0) (qpid/SessionState.cpp:163) (unresolved: 10.59.174.186:3014 10.59.174.211:10625 )
2009-dec-07 16:09:01 critical 10.59.174.211:10625(READY/error) error 606 did not occur on 10.59.174.186:3014
2009-dec-07 16:09:01 error Error delivering frames: Error 606 did not occur on all members (qpid/cluster/ErrorCheck.cpp:90)
2009-dec-07 16:09:01 notice 10.59.174.211:10625(LEFT/error) leaving cluster renga
2009-dec-07 16:09:01 notice Shut down



---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Qpid Cluster Errors

Posted by Alan Conway <ac...@redhat.com>.
On 11/17/2009 08:33 PM, Sandy Pratt wrote:
> I managed to capture a log today while trace was on.  Broker1 and broker2 are in a cluster, and broker1 is pulling messages from federated_broker1 when the error occurred:
>
> 2009-nov-17 17:22:29 debug DeliveryRecord::setEnded() id=840
> 2009-nov-17 17:22:29 debug Accepted 840
> 2009-nov-17 17:22:29 debug user1@QPID.5d051ed2-74fc-49bc-91ba-144d594b863c: receiver marked completed: 18 incomplete: { } unknown-completed: { [1,18] }
> 2009-nov-17 17:22:29 trace broker1:3193(READY) DLVR 838300: Frame[BEbe; channel=0; {SessionCompletedBody: commands={ [293,629] }; }] data broker2:21467-85
> 2009-nov-17 17:22:29 debug user1@QPID.8d28476a-f279-44b2-ae04-37410bc344b5: sender marked completed: { [293,629] }
> 2009-nov-17 17:22:29 debug Exception constructed: user1@QPID.8d28476a-f279-44b2-ae04-37410bc344b5: confirmed<  (630+0) but only sent<  (629+0) (qpid/SessionState.cpp:163)
> 2009-nov-17 17:22:29 error Execution exception: invalid-argument: user1@QPID.8d28476a-f279-44b2-ae04-37410bc344b5: confirmed<  (630+0) but only sent<  (629+0) (qpid/SessionState.cpp:163)
> 2009-nov-17 17:22:29 error broker1:3193(READY/error) channel error 838300 on broker2:21467-85(shadow): invalid-argument: user1@QPID.8d28476a-f279-44b2-ae04-37410bc344b5: confirmed<  (630+0) but only sent<  (629+0) (qpid/SessionState.cpp:163) (unresolved: broker1:3193 broker2:21467 )
> 2009-nov-17 17:22:29 trace MCAST broker2:21467-0: {ClusterErrorCheckBody: type=1; frame-seq=838300; }
> 2009-nov-17 17:22:29 trace broker1:3193(READY/error) DLVR: Event[broker1:3193-0 Frame[BEbe; channel=0; {ClusterErrorCheckBody: type=1; frame-seq=838300; }]]
> 2009-nov-17 17:22:29 debug broker1:3193(READY/error) error 838300 outcome agrees with broker1:3193
> 2009-nov-17 17:22:29 debug broker1:3193(READY/error) Error 838300 still unresolved: broker2:21467
> 2009-nov-17 17:22:29 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-90 Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=64; }]]
> 2009-nov-17 17:22:29 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-85 Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=96; }]]
> 2009-nov-17 17:22:29 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-84  data 26 bytes]
> 2009-nov-17 17:22:29 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-88 Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=512; }]]
> 2009-nov-17 17:22:29 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-88  data 26 bytes]
> 2009-nov-17 17:22:29 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-90  data 26 bytes]
> 2009-nov-17 17:22:29 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-86 Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=128; }]]
> 2009-nov-17 17:22:30 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-83 Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=128; }]]
> 2009-nov-17 17:22:30 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-89 Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=32; }]]
> 2009-nov-17 17:22:30 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-87 Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=32; }]]
> 2009-nov-17 17:22:30 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-91  data 26 bytes]
> 2009-nov-17 17:22:30 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-92 Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=256; }]]
> 2009-nov-17 17:22:30 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-91 Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=62; }]]
> 2009-nov-17 17:22:30 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-86  data 26 bytes]
> 2009-nov-17 17:22:30 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-0 Frame[BEbe; channel=0; {ClusterErrorCheckBody: type=0; frame-seq=838300; }]]
> 2009-nov-17 17:22:30 critical broker1:3193(READY/error) error 838300 did not occur on broker2:21467
> 2009-nov-17 17:22:30 debug Exception constructed: Aborted by local failure that did not occur on all replicas
> 2009-nov-17 17:22:30 error Error delivering frames: Aborted by local failure that did not occur on all replicas
> 2009-nov-17 17:22:30 notice broker1:3193(LEFT/error) leaving cluster renga-qpid-dev
> 2009-nov-17 17:22:30 debug broker1:3193(LEFT/error) deleted connection: broker2:21467-92(shadow)
> 2009-nov-17 17:22:30 debug broker1:3193(LEFT/error) deleted connection: broker2:21467-91(shadow)
> 2009-nov-17 17:22:30 debug broker1:3193(LEFT/error) deleted connection: broker2:21467-90(shadow)
> 2009-nov-17 17:22:30 debug broker1:3193(LEFT/error) deleted connection: broker2:21467-89(shadow)
> 2009-nov-17 17:22:30 debug broker1:3193(LEFT/error) deleted connection: broker2:21467-88(shadow)
> 2009-nov-17 17:22:30 debug broker1:3193(LEFT/error) deleted connection: broker2:21467-87(shadow)
> 2009-nov-17 17:22:30 debug broker1:3193(LEFT/error) deleted connection: broker2:21467-86(shadow)
> 2009-nov-17 17:22:30 debug broker1:3193(LEFT/error) deleted connection: broker2:21467-85(shadow)
> 2009-nov-17 17:22:30 debug broker1:3193(LEFT/error) deleted connection: broker2:21467-84(shadow)
> 2009-nov-17 17:22:30 debug broker1:3193(LEFT/error) deleted connection: broker2:21467-83(shadow)
> 2009-nov-17 17:22:30 debug Shutting down CPG
> 2009-nov-17 17:22:30 notice Shut down
> 2009-nov-17 17:22:30 error Connection federated_broker1:5672 closed by error: closed by management(320)
> 2009-nov-17 17:22:30 debug Journal "test": Destroyed
> 2009-nov-17 17:22:30 debug Journal "coreToNode": Destroyed
> 2009-nov-17 17:22:30 debug Journal "TplStore": Destroyed
>

I think this is an instance of one of the following two bugs, probably the first 
one:

(1) https://bugzilla.redhat.com/show_bug.cgi?id=516501
(2) http://svn.apache.org/viewvc?view=revision&revision=834026

Both are fixed for the next release.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Qpid Cluster Errors

Posted by Gordon Sim <gs...@redhat.com>.
On 11/18/2009 05:08 PM, Sandy Pratt wrote:
>> On 11/17/2009 08:33 PM, Sandy Pratt wrote:
>>> I managed to capture a log today while trace was on.  Broker1 and
>> broker2 are in a cluster, and broker1 is pulling messages from
>> federated_broker1 when the error occurred:
>>>
>> Can you send me the rpm -q qpidd output so I can verify the version, or
>> if you
>> built it yourself where you got the sources from? Thanks for the trace,
>> I'll try
>> to figure out what's going on.
>
> Sure thing:
>
> qpidd-0.5.752581-26

There are two fixes that may be relevant (both can give the sorts of 
error message you describe):

(1) http://svn.apache.org/viewvc?view=revision&revision=794736, or
(2) http://svn.apache.org/viewvc?view=revision&revision=834026 (this one 
is relevant only if you are using non-acquiring subscribers i.e. browsers)


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


RE: Qpid Cluster Errors

Posted by Sandy Pratt <pr...@adobe.com>.
> On 11/17/2009 08:33 PM, Sandy Pratt wrote:
> > I managed to capture a log today while trace was on.  Broker1 and
> broker2 are in a cluster, and broker1 is pulling messages from
> federated_broker1 when the error occurred:
> >
> Can you send me the rpm -q qpidd output so I can verify the version, or
> if you
> built it yourself where you got the sources from? Thanks for the trace,
> I'll try
> to figure out what's going on.

Sure thing:

qpidd-0.5.752581-26

Thanks,

Sandy

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Qpid Cluster Errors

Posted by Alan Conway <ac...@redhat.com>.
On 11/17/2009 08:33 PM, Sandy Pratt wrote:
> I managed to capture a log today while trace was on.  Broker1 and broker2 are in a cluster, and broker1 is pulling messages from federated_broker1 when the error occurred:
>
Can you send me the rpm -q qpidd output so I can verify the version, or if you 
built it yourself where you got the sources from? Thanks for the trace, I'll try 
to figure out what's going on.


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


RE: Qpid Cluster Errors

Posted by Sandy Pratt <pr...@adobe.com>.
I managed to capture a log today while trace was on.  Broker1 and broker2 are in a cluster, and broker1 is pulling messages from federated_broker1 when the error occurred:

2009-nov-17 17:22:29 debug DeliveryRecord::setEnded() id=840
2009-nov-17 17:22:29 debug Accepted 840
2009-nov-17 17:22:29 debug user1@QPID.5d051ed2-74fc-49bc-91ba-144d594b863c: receiver marked completed: 18 incomplete: { } unknown-completed: { [1,18] }
2009-nov-17 17:22:29 trace broker1:3193(READY) DLVR 838300: Frame[BEbe; channel=0; {SessionCompletedBody: commands={ [293,629] }; }] data broker2:21467-85
2009-nov-17 17:22:29 debug user1@QPID.8d28476a-f279-44b2-ae04-37410bc344b5: sender marked completed: { [293,629] }
2009-nov-17 17:22:29 debug Exception constructed: user1@QPID.8d28476a-f279-44b2-ae04-37410bc344b5: confirmed < (630+0) but only sent < (629+0) (qpid/SessionState.cpp:163)
2009-nov-17 17:22:29 error Execution exception: invalid-argument: user1@QPID.8d28476a-f279-44b2-ae04-37410bc344b5: confirmed < (630+0) but only sent < (629+0) (qpid/SessionState.cpp:163)
2009-nov-17 17:22:29 error broker1:3193(READY/error) channel error 838300 on broker2:21467-85(shadow): invalid-argument: user1@QPID.8d28476a-f279-44b2-ae04-37410bc344b5: confirmed < (630+0) but only sent < (629+0) (qpid/SessionState.cpp:163) (unresolved: broker1:3193 broker2:21467 )
2009-nov-17 17:22:29 trace MCAST broker2:21467-0: {ClusterErrorCheckBody: type=1; frame-seq=838300; }
2009-nov-17 17:22:29 trace broker1:3193(READY/error) DLVR: Event[broker1:3193-0 Frame[BEbe; channel=0; {ClusterErrorCheckBody: type=1; frame-seq=838300; }]]
2009-nov-17 17:22:29 debug broker1:3193(READY/error) error 838300 outcome agrees with broker1:3193
2009-nov-17 17:22:29 debug broker1:3193(READY/error) Error 838300 still unresolved: broker2:21467
2009-nov-17 17:22:29 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-90 Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=64; }]]
2009-nov-17 17:22:29 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-85 Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=96; }]]
2009-nov-17 17:22:29 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-84  data 26 bytes]
2009-nov-17 17:22:29 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-88 Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=512; }]]
2009-nov-17 17:22:29 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-88  data 26 bytes]
2009-nov-17 17:22:29 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-90  data 26 bytes]
2009-nov-17 17:22:29 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-86 Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=128; }]]
2009-nov-17 17:22:30 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-83 Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=128; }]]
2009-nov-17 17:22:30 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-89 Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=32; }]]
2009-nov-17 17:22:30 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-87 Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=32; }]]
2009-nov-17 17:22:30 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-91  data 26 bytes]
2009-nov-17 17:22:30 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-92 Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=256; }]]
2009-nov-17 17:22:30 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-91 Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=62; }]]
2009-nov-17 17:22:30 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-86  data 26 bytes]
2009-nov-17 17:22:30 trace broker1:3193(READY/error) DLVR: Event[broker2:21467-0 Frame[BEbe; channel=0; {ClusterErrorCheckBody: type=0; frame-seq=838300; }]]
2009-nov-17 17:22:30 critical broker1:3193(READY/error) error 838300 did not occur on broker2:21467
2009-nov-17 17:22:30 debug Exception constructed: Aborted by local failure that did not occur on all replicas
2009-nov-17 17:22:30 error Error delivering frames: Aborted by local failure that did not occur on all replicas
2009-nov-17 17:22:30 notice broker1:3193(LEFT/error) leaving cluster renga-qpid-dev
2009-nov-17 17:22:30 debug broker1:3193(LEFT/error) deleted connection: broker2:21467-92(shadow)
2009-nov-17 17:22:30 debug broker1:3193(LEFT/error) deleted connection: broker2:21467-91(shadow)
2009-nov-17 17:22:30 debug broker1:3193(LEFT/error) deleted connection: broker2:21467-90(shadow)
2009-nov-17 17:22:30 debug broker1:3193(LEFT/error) deleted connection: broker2:21467-89(shadow)
2009-nov-17 17:22:30 debug broker1:3193(LEFT/error) deleted connection: broker2:21467-88(shadow)
2009-nov-17 17:22:30 debug broker1:3193(LEFT/error) deleted connection: broker2:21467-87(shadow)
2009-nov-17 17:22:30 debug broker1:3193(LEFT/error) deleted connection: broker2:21467-86(shadow)
2009-nov-17 17:22:30 debug broker1:3193(LEFT/error) deleted connection: broker2:21467-85(shadow)
2009-nov-17 17:22:30 debug broker1:3193(LEFT/error) deleted connection: broker2:21467-84(shadow)
2009-nov-17 17:22:30 debug broker1:3193(LEFT/error) deleted connection: broker2:21467-83(shadow)
2009-nov-17 17:22:30 debug Shutting down CPG
2009-nov-17 17:22:30 notice Shut down
2009-nov-17 17:22:30 error Connection federated_broker1:5672 closed by error: closed by management(320)
2009-nov-17 17:22:30 debug Journal "test": Destroyed
2009-nov-17 17:22:30 debug Journal "coreToNode": Destroyed
2009-nov-17 17:22:30 debug Journal "TplStore": Destroyed

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


RE: Qpid Cluster Errors

Posted by Sandy Pratt <pr...@adobe.com>.
> Sandy Pratt wrote:
> >> On 11/16/2009 07:26 PM, Sandy Pratt wrote:
> >>
> >>> Hello all,
> >>>
> >>> I've been working with a Qpid of two machines recently.  It's
> working
> >>>
> >> well for the most part, but I've been experiencing a few sporadic
> >> failures.  I've collected logs for them over time, and I was
> wondering
> >> if anyone else had seen anything similar and had any idea what might
> be
> >> happening.
> >>
> >> What version of qpid are you seeing these errors on?  They are
> similar
> >> to
> >> symptoms of bugs that have been fixed on trunk, but if you're seeing
> >> this on
> >> recent trunk then I want to look into it.
> >>
> >>
> >
> > I'm running the 0.5 release, built from the SRPMs in the Redhat
> depot:
> ftp://ftp.redhat.com/pub/redhat/linux/enterprise/5Server/en/RHEMRG/SRPM
> S/
> 
> These issues have been patched on the latest MRG distro release, if you
> update to that, or I can
> help you with that off list if needed. Pulling the latest from RHN will
> also work.
> 

Thanks Carl.  It looks like this might have been the issue: https://issues.apache.org/jira/browse/QPID-1711 ?

Sandy


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Qpid Cluster Errors

Posted by Carl Trieloff <cc...@redhat.com>.
Sandy Pratt wrote:
>> On 11/16/2009 07:26 PM, Sandy Pratt wrote:
>>     
>>> Hello all,
>>>
>>> I've been working with a Qpid of two machines recently.  It's working
>>>       
>> well for the most part, but I've been experiencing a few sporadic
>> failures.  I've collected logs for them over time, and I was wondering
>> if anyone else had seen anything similar and had any idea what might be
>> happening.
>>     
>> What version of qpid are you seeing these errors on?  They are similar
>> to
>> symptoms of bugs that have been fixed on trunk, but if you're seeing
>> this on
>> recent trunk then I want to look into it.
>>
>>     
>
> I'm running the 0.5 release, built from the SRPMs in the Redhat depot: ftp://ftp.redhat.com/pub/redhat/linux/enterprise/5Server/en/RHEMRG/SRPMS/

These issues have been patched on the latest MRG distro release, if you 
update to that, or I can
help you with that off list if needed. Pulling the latest from RHN will 
also work.

Carl.

RE: Qpid Cluster Errors

Posted by Sandy Pratt <pr...@adobe.com>.
> On 11/16/2009 07:26 PM, Sandy Pratt wrote:
> > Hello all,
> >
> > I've been working with a Qpid of two machines recently.  It's working
> well for the most part, but I've been experiencing a few sporadic
> failures.  I've collected logs for them over time, and I was wondering
> if anyone else had seen anything similar and had any idea what might be
> happening.
> >
> 
> What version of qpid are you seeing these errors on?  They are similar
> to
> symptoms of bugs that have been fixed on trunk, but if you're seeing
> this on
> recent trunk then I want to look into it.
> 

I'm running the 0.5 release, built from the SRPMs in the Redhat depot: ftp://ftp.redhat.com/pub/redhat/linux/enterprise/5Server/en/RHEMRG/SRPMS/

Thanks,

Sandy



---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Qpid Cluster Errors

Posted by Alan Conway <ac...@redhat.com>.
On 11/16/2009 07:26 PM, Sandy Pratt wrote:
> Hello all,
>
> I've been working with a Qpid of two machines recently.  It's working well for the most part, but I've been experiencing a few sporadic failures.  I've collected logs for them over time, and I was wondering if anyone else had seen anything similar and had any idea what might be happening.
>

What version of qpid are you seeing these errors on?  They are similar to 
symptoms of bugs that have been fixed on trunk, but if you're seeing this on 
recent trunk then I want to look into it.

> Thanks in advance!
>
> Logs:
>
> 2009-nov-06 12:35:55 notice Recovering from cluster, no recovery from local journal
> 2009-nov-06 12:35:55 notice Listening on TCP port 5672
> 2009-nov-06 12:35:55 notice broker1:9646(INIT) joining cluster renga-qpid-dev with url=amqp:tcp:broker1:5672
> 2009-nov-06 12:35:55 notice Broker running
> 2009-nov-06 12:35:55 notice Journal "test3": Created
> 2009-nov-06 12:35:58 notice broker1:9646(READY) caught up, active cluster member
> 2009-nov-06 13:56:17 error Execution exception: invalid-argument: user1@QPID.02c092ed-0256-4f70-b8d6-41e67666c8cc: confirmed<  (654+0) but only sent<  (653+0) (qpid/SessionState.cpp:163)
> 2009-nov-06 13:56:17 error broker1:9646(READY/error) channel error 1125458 on broker2:9011-347(shadow): invalid-argument: user1@QPID.02c092ed-0256-4f70-b8d6-41e67666c8cc: confirmed<  (654+0) but only sent<  (653+0) (qpid/SessionState.cpp:163) (unresolved: broker1:9646 broker2:9011 )
> 2009-nov-06 13:56:17 critical broker1:9646(READY/error) error 1125458 did not occur on broker2:9011
> 2009-nov-06 13:56:17 error Error delivering frames: Aborted by local failure that did not occur on all replicas
> 2009-nov-06 13:56:17 notice broker1:9646(LEFT/error) leaving cluster renga-qpid-dev
> 2009-nov-06 13:56:17 notice Shut down
>
> 2009-nov-06 15:40:22 notice Journal "TplStore": Created
> 2009-nov-06 15:40:22 notice Store module initialized; dir=/var/lib/qpidd
> 2009-nov-06 15:40:22 notice Recovering from cluster, no recovery from local journal
> 2009-nov-06 15:40:22 notice Listening on TCP port 5672
> 2009-nov-06 15:40:22 notice broker1:10496(INIT) joining cluster renga-qpid-dev with url=amqp:tcp:broker1:5672
> 2009-nov-06 15:40:22 notice Broker running
> 2009-nov-06 15:40:22 notice Journal "test3": Created
> 2009-nov-06 15:40:26 notice broker1:10496(READY) caught up, active cluster member
> 2009-nov-06 15:45:18 error Execution exception: invalid-argument: user1@QPID.8bc14c39-926e-4416-b6fb-2495b1cf9c8e: confirmed<  (6502+0) but only sent<  (6498+0) (qpid/SessionState.cpp:163)
> 2009-nov-06 15:45:18 error broker1:10496(READY/error) channel error 895882 on broker2:10019-131(shadow): invalid-argument: user1@QPID.8bc14c39-926e-4416-b6fb-2495b1cf9c8e: confirmed<  (6502+0) but only sent<  (6498+0) (qpid/SessionState.cpp:163) (unresolved: broker1:10496 broker2:10019 )
> 2009-nov-06 15:45:18 critical broker1:10496(READY/error) error 895882 did not occur on broker2:10019
> 2009-nov-06 15:45:18 error Error delivering frames: Aborted by local failure that did not occur on all replicas
> 2009-nov-06 15:45:18 notice broker1:10496(LEFT/error) leaving cluster renga-qpid-dev
> 2009-nov-06 15:45:18 notice Shut down
>
> 2009-nov-06 16:02:38 notice broker1:10590(INIT) joining cluster renga-qpid-dev with url=amqp:tcp:broker1:5672
> 2009-nov-06 16:02:38 notice Broker running
> 2009-nov-06 16:02:38 notice Journal "test3": Created
> 2009-nov-06 16:02:42 notice broker1:10590(READY) caught up, active cluster member
> 2009-nov-06 16:08:19 error Execution exception: invalid-argument: user1@QPID.6884577f-39ca-46af-8981-ad6043292ba8: confirmed<  (8757+0) but only sent<  (8756+0) (qpid/SessionState.cpp:163)
> 2009-nov-06 16:08:19 error broker1:10590(READY/error) channel error 2436085 on broker2:10019-430(shadow): invalid-argument: user1@QPID.6884577f-39ca-46af-8981-ad6043292ba8: confirmed<  (8757+0) but only sent<  (8756+0) (qpid/SessionState.cpp:163) (unresolved: broker1:10590 broker2:10019 )
> 2009-nov-06 16:08:19 critical broker1:10590(READY/error) error 2436085 did not occur on broker2:10019
> 2009-nov-06 16:08:19 error Error delivering frames: Aborted by local failure that did not occur on all replicas
> 2009-nov-06 16:08:19 notice broker1:10590(LEFT/error) leaving cluster renga-qpid-dev
> 2009-nov-06 16:08:20 notice Shut down
>
> 2009-nov-16 16:15:39 error Execution exception: invalid-argument: user1@QPID.941b62b1-55d9-443a-8ef2-dfc3abf81227: confirmed<  (864+0) but only sent<  (863+0) (qpid/SessionState.cpp:163)
> 2009-nov-16 16:15:39 error broker1:31842(READY/error) channel error 4208088 on broker2:31018-476(shadow): invalid-argument: user1@QPID.941b62b1-55d9-443a-8ef2-dfc3abf81227: confirmed<  (864+0) but only sent<  (863+0) (qpid/SessionState.cpp:163) (unresolved: broker1:31842 broker2:31018 )
> 2009-nov-16 16:15:39 critical broker1:31842(READY/error) error 4208088 did not occur on broker2:31018
> 2009-nov-16 16:15:39 error Error delivering frames: Aborted by local failure that did not occur on all replicas
> 2009-nov-16 16:15:39 notice broker1:31842(LEFT/error) leaving cluster renga-qpid-dev
> 2009-nov-16 16:15:39 notice Shut down
> 2009-nov-16 16:15:39 error Connection 10.59.174.186:5672 closed by error: closed by management(320)
>


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org