You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@qpid.apache.org by "3.listas@adminlinux.com.br" <li...@adminlinux.com.br> on 2012/11/28 18:06:57 UTC

All cluster members die simultaneously with store state "dirty"

Hi,

I have a Qpid cluster (3 members) with this configuration:
   Ubuntu12.04
   qpidd-0.14-2
   qpidd-msgstore-0.14-1
   cman-3.1.7
   corosync-1.4.2-2
   libtotem-pg4-1.4.2-2

My cluster works perfectly for months. But sometimes all members die 
simultaneously. Cluster store state gets dirty in all members and the 
daemon runs again only after this procedure:
   cd <data-dir>
   mv rhm rhm.bak
   cp -a _cluster.bak.<nnnn>/rhm .
   qpid-cluster-store -c <data-dir>

Therefore, messages are lost in my queues.

I added "ulimit-c unlimited" in /etc/init.d/qpid but no core has been 
generated.
Here are the logs from the time of the crash: 
http://www.adminlinux.com.br/logs

Can anyone help me?
Thanks.

-- 
Thiago Henrique
adminlinux.com.br




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: All cluster members die simultaneously with store state "dirty"

Posted by Gordon Sim <gs...@redhat.com>.

On 11/29/2012 08:40 AM, Pavel Moravec wrote:
> Hi,
> qpidd brokers stopped as they run on un-quorable cluster:
>
> Nov 23 23:14:09 server_name corosync[27510]:   [CMAN  ] quorum lost, blocking activity
> Nov 23 23:14:09 server_name corosync[27510]:   [QUORUM] This node is within the non-primary component and will NOT provide any services.
> Nov 23 23:14:09 server_name corosync[27510]:   [QUORUM] Members[1]: 1
> Nov 23 23:14:09 server_name qpidd[29703]: 2012-11-23 23:14:09 critical Lost contact with cluster quorum.
>
> That is proper reaction to network split-brain (i.e. something that qpid can't affect).
>
> When starting a clustered broker with dirty store as the first broker within the cluster, the broker:
>
> - stops itself with the "Cannot recover, no clean store." error, expecting some other peer will start up with clear (i.e. up-to-date) store
> - backs-up store
>
> Then second step is redundant and might cause the possible store content removal.

Note though that you can manually use the qpid-cluster tool to mark a 
given store clean. (You can also retrieve the backed up store if needed, 
moving it back up a directory).


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: All cluster members die simultaneously with store state "dirty"

Posted by Pavel Moravec <pm...@redhat.com>.

Hi,
qpidd brokers stopped as they run on un-quorable cluster:

Nov 23 23:14:09 server_name corosync[27510]:   [CMAN  ] quorum lost, blocking activity
Nov 23 23:14:09 server_name corosync[27510]:   [QUORUM] This node is within the non-primary component and will NOT provide any services.
Nov 23 23:14:09 server_name corosync[27510]:   [QUORUM] Members[1]: 1
Nov 23 23:14:09 server_name qpidd[29703]: 2012-11-23 23:14:09 critical Lost contact with cluster quorum.

That is proper reaction to network split-brain (i.e. something that qpid can't affect).

When starting a clustered broker with dirty store as the first broker within the cluster, the broker:

- stops itself with the "Cannot recover, no clean store." error, expecting some other peer will start up with clear (i.e. up-to-date) store
- backs-up store

Then second step is redundant and might cause the possible store content removal.

That is unwanted behavior but not planned to be fixed, as current cluster solution is being replaced by new one in upstream (https://issues.apache.org/jira/browse/QPID-3603).

Kind regards,
Pavel Moravec


----- Original Message -----
> From: "3.listas@adminlinux.com.br" <li...@adminlinux.com.br>
> To: users@qpid.apache.org
> Sent: Wednesday, November 28, 2012 6:06:57 PM
> Subject: All cluster members die simultaneously with store state "dirty"
> 
> Hi,
> 
> I have a Qpid cluster (3 members) with this configuration:
>    Ubuntu12.04
>    qpidd-0.14-2
>    qpidd-msgstore-0.14-1
>    cman-3.1.7
>    corosync-1.4.2-2
>    libtotem-pg4-1.4.2-2
> 
> My cluster works perfectly for months. But sometimes all members die
> simultaneously. Cluster store state gets dirty in all members and the
> daemon runs again only after this procedure:
>    cd <data-dir>
>    mv rhm rhm.bak
>    cp -a _cluster.bak.<nnnn>/rhm .
>    qpid-cluster-store -c <data-dir>
> 
> Therefore, messages are lost in my queues.
> 
> I added "ulimit-c unlimited" in /etc/init.d/qpid but no core has been
> generated.
> Here are the logs from the time of the crash:
> http://www.adminlinux.com.br/logs
> 
> Can anyone help me?
> Thanks.
> 
> --
> Thiago Henrique
> adminlinux.com.br
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org