You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@qpid.apache.org by stoyan <st...@hotmail.com> on 2011/05/18 12:57:13 UTC
cannot restart failed cluster node
hello
i seem to have had a network failure in my cluster of two nodes - main node
A lived on, while on node B qpid quit.
now, there are two queues (Q1, Q2 with same routing key) and after this
incident broker A kept receiving messages to these queues.
after some time i tried to restart node B and couldn't - first i tried with
its data-dir untouched, then i removed the data dir contents altogether.
judging by the qpid logs, the B broker joined the cluster and started
receiving state updates; it read all the messages for queue Q1 and then died
when reading the first message for Q2, the last log message is
'qpid.cluster-update: recv cmd 28: content (267 bytes) <?xml version="1.0"
encoding="ut...'
i managed to start B only when i 'drain'ed the contents of Q2
any hints of what i might be doing wrong when starting up the failed node?
thanks!
stoyan
btw: on node A corosync-cpgtool wrongly thought A and B are still in a
cluster all the time, while on B it properly showed A as the lone node in
the cluster, but thats a different matter
c++ qpid 0.8
corosync 1.3.1
rhel5
the initial network error indicator in corosync.log was
corosync[8458]: [TOTEM ] A processor failed, forming new configuration
later followed by
qpidd[8474]: 2011-05-17 21:44:32 critical Multicast error: Cannot mcast to
CPG group QpidCluster: not exist (12)
--
View this message in context: http://apache-qpid-users.2158936.n2.nabble.com/cannot-restart-failed-cluster-node-tp6377307p6377307.html
Sent from the Apache Qpid users mailing list archive at Nabble.com.
---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project: http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org
Re: cannot restart failed cluster node
Posted by Alan Conway <ac...@redhat.com>.
On 05/18/2011 06:57 AM, stoyan wrote:
> hello
> i seem to have had a network failure in my cluster of two nodes - main node
> A lived on, while on node B qpid quit.
> now, there are two queues (Q1, Q2 with same routing key) and after this
> incident broker A kept receiving messages to these queues.
> after some time i tried to restart node B and couldn't - first i tried with
> its data-dir untouched, then i removed the data dir contents altogether.
> judging by the qpid logs, the B broker joined the cluster and started
> receiving state updates; it read all the messages for queue Q1 and then died
> when reading the first message for Q2, the last log message is
> 'qpid.cluster-update: recv cmd 28: content (267 bytes)<?xml version="1.0"
> encoding="ut...'
>
> i managed to start B only when i 'drain'ed the contents of Q2
>
> any hints of what i might be doing wrong when starting up the failed node?
Were there any core files generated? Send me the logs (of both nodes) and I'll
take a look.
---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project: http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org