You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Pavel Moravec (JIRA)" <ji...@apache.org> on 2012/08/08 09:19:09 UTC

[jira] [Closed] (QPID-3796) QMF errors ignored by cluster, causing cluster de-sync

     [ https://issues.apache.org/jira/browse/QPID-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pavel Moravec closed QPID-3796.
-------------------------------

    Resolution: Not A Problem

Quoting Ken Giusti:

The result, while not ideal, cannot be prevented because the cluster cannot be guaranteed to operate correctly in this configuration.

The host environment differs between the clustered brokers - one host has more available diskspace than the other.   This contradicts the prescribed deployment guidelines for clustering - the environments must provide equvalent resources. If that is not held, eventually discrepencies will be introduced.
                
> QMF errors ignored by cluster, causing cluster de-sync
> ------------------------------------------------------
>
>                 Key: QPID-3796
>                 URL: https://issues.apache.org/jira/browse/QPID-3796
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker
>    Affects Versions: 0.12
>            Reporter: Pavel Moravec
>         Attachments: create_queue.cpp
>
>
> Cluster error handling ignores errors on QMF. That leads to leave running a node affected by an error not seen by other nodes, i.e cluster de-sync.
> Particular example: Via QMF, create a huge durable queue on a 2 node cluster, such that node1 of the cluster does not have sufficient free disk space for the queue journals, while node2 has enough free disk space. Cluster won't detect node1 failed to create the queue, leaving a cluster running with 1 node with the queue and 1 node without the queue.
> Reproduction scenario:
> 1) 2 node cluster running
> 2) Let leave less than 13M of free disk space on node1 (while enough free space on node2)
> 3) On node1, run the attached simple program that will create queue HugeDurableQueue with qpid.file_count=64 and qpid.file_size=16384.
> 4) QMF response will be negative (correct), but both nodes will be running with node1 not having the queue provisioned while node2 having the queue.
> 5) Repeating the test with sending the QMF command to node2 (with enough free disk space) will produce _positive_ QMF response - a user is _not_ aware of a problem on the cluster anyhow.
> Both problems (node1 needs to be shutted down + QMF response has to be NACK everytime) shall be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org