You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@qpid.apache.org by Joshua Braegger <rc...@gmail.com> on 2009/04/30 21:14:11 UTC
Clustered broker fails to correctly rejoin cluster
I've been using qpid for the last couple of days, have been testing failure
scenarios, and had a question about a side-effect I've been having -- I'm
wondering if it's a bug or if I'm doing something wrong.
First, let me explain what I am doing. Basically, I start a cluster --
brokerA and brokerB, create an exchange/queue, and have a consumer poll
brokerA for messages. Then I kill brokerB, and start it back up again. It
joins the cluster fine. However, when I attempt to send messages to brokerA
and brokerB from my producer, brokerB dies and I get the following output on
brokerB (startup to shutdown):
2009-apr-30 15:09:47 notice Recovering from cluster, no recovery from local
journal
2009-apr-30 15:09:47 notice SASL disabled: No Authentication Performed
2009-apr-30 15:09:47 notice Listening on TCP port 5678
2009-apr-30 15:09:47 notice 10.251.43.162:30337(INIT) joining cluster foo
with url=amqp:tcp:10.251.43.162:5678
2009-apr-30 15:09:47 notice Broker running
2009-apr-30 15:09:47 notice 10.251.43.162:30337(READY) caught up, active
cluster member
2009-apr-30 15:10:14 error Channel exception: not-attached:
session.completed: channel 0 is not attached
(qpid/amqp_0_10/SessionHandler.cpp:232)
2009-apr-30 15:10:14 critical 10.251.43.162:30337(READY/error) Error 57 did
not occur on 10.251.43.162:30273
2009-apr-30 15:10:14 error Error delivering frames: Aborted by local failure
that did not occur on all replicas
2009-apr-30 15:10:14 notice 10.251.43.162:30337(LEFT/error) leaving cluster
foo
2009-apr-30 15:10:14 notice Shut down
brokerA remains running fine.
I'm using CentOS 5.3. Let me know if any more information would be helpful
Re: Clustered broker fails to correctly rejoin cluster
Posted by Alan Conway <ac...@redhat.com>.
Joshua Braegger wrote:
> Alan,
>
> Sure thing, I've attached the two logs as well as the producer/consumer
> test scripts I've been using.
>
> brokerA.log is the broker I've been producing and consuming messages on,
> and brokerB.log is the broker that I killed and attemped to have rejoin
> the cluster.
>
> Just to re-iterate, these are the steps I used to reproduce:
>
> 1. Start brokerA
> 2. Start brokerB
> 3. Start the consumer (run consumer.py). This uses brokerA
> 4. Produce 1 message (run producer.py). This uses brokerA
> 5. Kill brokerB
> 6. Start brokerB again
> 7. Produce 1 more message (run producer.py). This uses brokerA again.
> 8. Observe brokerB is down
>
Thanks for catching this. The python client starts numbering channels from 0,
which used to be illegal. The cluster didn't handle channel 0 properly. I fixed
it at both ends (C++ can handle channel 0 and python doesn't use it) in revision
771452.
---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project: http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org
Re: Clustered broker fails to correctly rejoin cluster
Posted by Joshua Braegger <rc...@gmail.com>.
Alan,
Sure thing, I've attached the two logs as well as the producer/consumer test
scripts I've been using.
brokerA.log is the broker I've been producing and consuming messages on, and
brokerB.log is the broker that I killed and attemped to have rejoin the
cluster.
Just to re-iterate, these are the steps I used to reproduce:
1. Start brokerA
2. Start brokerB
3. Start the consumer (run consumer.py). This uses brokerA
4. Produce 1 message (run producer.py). This uses brokerA
5. Kill brokerB
6. Start brokerB again
7. Produce 1 more message (run producer.py). This uses brokerA again.
8. Observe brokerB is down
I'm at r769550
-Josh
On Fri, May 1, 2009 at 6:39 AM, Alan Conway <ac...@redhat.com> wrote:
>
> Try running with --trace argument to brokers and send the logs from brokers
> A & B (they'll be quite big)
>
> If you have a simple test client that demonstrates the problem attach that
> too.
>
> What SVN revision or what release of Qpid are you using?
>
> I'll look into this...
>
> Cheers,
> Alan.
>
Re: Clustered broker fails to correctly rejoin cluster
Posted by Alan Conway <ac...@redhat.com>.
Joshua Braegger wrote:
> I've been using qpid for the last couple of days, have been testing failure
> scenarios, and had a question about a side-effect I've been having -- I'm
> wondering if it's a bug or if I'm doing something wrong.
>
> First, let me explain what I am doing. Basically, I start a cluster --
> brokerA and brokerB, create an exchange/queue, and have a consumer poll
> brokerA for messages. Then I kill brokerB, and start it back up again. It
> joins the cluster fine. However, when I attempt to send messages to brokerA
> and brokerB from my producer, brokerB dies and I get the following output on
> brokerB (startup to shutdown):
>
> 2009-apr-30 15:09:47 notice Recovering from cluster, no recovery from local
> journal
> 2009-apr-30 15:09:47 notice SASL disabled: No Authentication Performed
> 2009-apr-30 15:09:47 notice Listening on TCP port 5678
> 2009-apr-30 15:09:47 notice 10.251.43.162:30337(INIT) joining cluster foo
> with url=amqp:tcp:10.251.43.162:5678
> 2009-apr-30 15:09:47 notice Broker running
> 2009-apr-30 15:09:47 notice 10.251.43.162:30337(READY) caught up, active
> cluster member
> 2009-apr-30 15:10:14 error Channel exception: not-attached:
> session.completed: channel 0 is not attached
> (qpid/amqp_0_10/SessionHandler.cpp:232)
> 2009-apr-30 15:10:14 critical 10.251.43.162:30337(READY/error) Error 57 did
> not occur on 10.251.43.162:30273
> 2009-apr-30 15:10:14 error Error delivering frames: Aborted by local failure
> that did not occur on all replicas
> 2009-apr-30 15:10:14 notice 10.251.43.162:30337(LEFT/error) leaving cluster
> foo
> 2009-apr-30 15:10:14 notice Shut down
>
> brokerA remains running fine.
>
> I'm using CentOS 5.3. Let me know if any more information would be helpful
>
Try running with --trace argument to brokers and send the logs from brokers A &
B (they'll be quite big)
If you have a simple test client that demonstrates the problem attach that too.
What SVN revision or what release of Qpid are you using?
I'll look into this...
Cheers,
Alan.
---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project: http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org