You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@qpid.apache.org by Joshua Braegger <rc...@gmail.com> on 2009/04/30 21:14:11 UTC

Clustered broker fails to correctly rejoin cluster

I've been using qpid for the last couple of days, have been testing failure
scenarios, and had a question about a side-effect I've been having -- I'm
wondering if it's a bug or if I'm doing something wrong.

First, let me explain what I am doing.  Basically, I start a cluster --
brokerA and brokerB, create an exchange/queue, and have a consumer poll
brokerA for messages.  Then I kill brokerB, and start it back up again.  It
joins the cluster fine.  However, when I attempt to send messages to brokerA
and brokerB from my producer, brokerB dies and I get the following output on
brokerB (startup to shutdown):

2009-apr-30 15:09:47 notice Recovering from cluster, no recovery from local
journal
2009-apr-30 15:09:47 notice SASL disabled: No Authentication Performed
2009-apr-30 15:09:47 notice Listening on TCP port 5678
2009-apr-30 15:09:47 notice 10.251.43.162:30337(INIT) joining cluster foo
with url=amqp:tcp:10.251.43.162:5678
2009-apr-30 15:09:47 notice Broker running
2009-apr-30 15:09:47 notice 10.251.43.162:30337(READY) caught up, active
cluster member
2009-apr-30 15:10:14 error Channel exception: not-attached:
session.completed: channel 0 is not attached
(qpid/amqp_0_10/SessionHandler.cpp:232)
2009-apr-30 15:10:14 critical 10.251.43.162:30337(READY/error) Error 57 did
not occur on 10.251.43.162:30273
2009-apr-30 15:10:14 error Error delivering frames: Aborted by local failure
that did not occur on all replicas
2009-apr-30 15:10:14 notice 10.251.43.162:30337(LEFT/error) leaving cluster
foo
2009-apr-30 15:10:14 notice Shut down

brokerA remains running fine.

I'm using CentOS 5.3.  Let me know if any more information would be helpful

Re: Clustered broker fails to correctly rejoin cluster

Posted by Alan Conway <ac...@redhat.com>.
Joshua Braegger wrote:
> Alan,
> 
> Sure thing, I've attached the two logs as well as the producer/consumer 
> test scripts I've been using.
> 
> brokerA.log is the broker I've been producing and consuming messages on, 
> and brokerB.log is the broker that I killed and attemped to have rejoin 
> the cluster.
> 
> Just to re-iterate, these are the steps I used to reproduce:
> 
> 1. Start brokerA
> 2. Start brokerB
> 3. Start the consumer (run consumer.py).  This uses brokerA
> 4. Produce 1 message (run producer.py).  This uses brokerA
> 5. Kill brokerB
> 6. Start brokerB again
> 7. Produce 1 more message (run producer.py).  This uses brokerA again.
> 8. Observe brokerB is down
> 

Thanks for catching this. The python client starts numbering channels from 0, 
which used to be illegal. The cluster didn't handle channel 0 properly. I fixed 
it at both ends (C++ can handle channel 0 and python doesn't use it) in revision 
771452.


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Clustered broker fails to correctly rejoin cluster

Posted by Joshua Braegger <rc...@gmail.com>.
Alan,

Sure thing, I've attached the two logs as well as the producer/consumer test
scripts I've been using.

brokerA.log is the broker I've been producing and consuming messages on, and
brokerB.log is the broker that I killed and attemped to have rejoin the
cluster.

Just to re-iterate, these are the steps I used to reproduce:

1. Start brokerA
2. Start brokerB
3. Start the consumer (run consumer.py).  This uses brokerA
4. Produce 1 message (run producer.py).  This uses brokerA
5. Kill brokerB
6. Start brokerB again
7. Produce 1 more message (run producer.py).  This uses brokerA again.
8. Observe brokerB is down

I'm at r769550

-Josh


On Fri, May 1, 2009 at 6:39 AM, Alan Conway <ac...@redhat.com> wrote:

>
> Try running with --trace argument to brokers and send the logs from brokers
> A & B (they'll be quite big)
>
> If you have a simple test client that demonstrates the problem attach that
> too.
>
> What SVN revision or what release of Qpid are you using?
>
> I'll look into this...
>
> Cheers,
> Alan.
>

Re: Clustered broker fails to correctly rejoin cluster

Posted by Alan Conway <ac...@redhat.com>.
Joshua Braegger wrote:
> I've been using qpid for the last couple of days, have been testing failure
> scenarios, and had a question about a side-effect I've been having -- I'm
> wondering if it's a bug or if I'm doing something wrong.
> 
> First, let me explain what I am doing.  Basically, I start a cluster --
> brokerA and brokerB, create an exchange/queue, and have a consumer poll
> brokerA for messages.  Then I kill brokerB, and start it back up again.  It
> joins the cluster fine.  However, when I attempt to send messages to brokerA
> and brokerB from my producer, brokerB dies and I get the following output on
> brokerB (startup to shutdown):
> 
> 2009-apr-30 15:09:47 notice Recovering from cluster, no recovery from local
> journal
> 2009-apr-30 15:09:47 notice SASL disabled: No Authentication Performed
> 2009-apr-30 15:09:47 notice Listening on TCP port 5678
> 2009-apr-30 15:09:47 notice 10.251.43.162:30337(INIT) joining cluster foo
> with url=amqp:tcp:10.251.43.162:5678
> 2009-apr-30 15:09:47 notice Broker running
> 2009-apr-30 15:09:47 notice 10.251.43.162:30337(READY) caught up, active
> cluster member
> 2009-apr-30 15:10:14 error Channel exception: not-attached:
> session.completed: channel 0 is not attached
> (qpid/amqp_0_10/SessionHandler.cpp:232)
> 2009-apr-30 15:10:14 critical 10.251.43.162:30337(READY/error) Error 57 did
> not occur on 10.251.43.162:30273
> 2009-apr-30 15:10:14 error Error delivering frames: Aborted by local failure
> that did not occur on all replicas
> 2009-apr-30 15:10:14 notice 10.251.43.162:30337(LEFT/error) leaving cluster
> foo
> 2009-apr-30 15:10:14 notice Shut down
> 
> brokerA remains running fine.
> 
> I'm using CentOS 5.3.  Let me know if any more information would be helpful
> 

Try running with --trace argument to brokers and send the logs from brokers A & 
B (they'll be quite big)

If you have a simple test client that demonstrates the problem attach that too.

What SVN revision or what release of Qpid are you using?

I'll look into this...

Cheers,
Alan.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org