You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Alan Conway (JIRA)" <ji...@apache.org> on 2012/10/05 18:00:03 UTC

[jira] [Created] (QPID-4360) Non-ready HA broker can be incorrectly promoted

Alan Conway created QPID-4360:
---------------------------------

             Summary:  Non-ready HA broker can be incorrectly promoted
                 Key: QPID-4360
                 URL: https://issues.apache.org/jira/browse/QPID-4360
             Project: Qpid
          Issue Type: Bug
          Components: C++ Clustering
    Affects Versions: 0.18
            Reporter: Alan Conway
            Assignee: Alan Conway


Description of problem:
rgmanager can promote a non-ready backup HA broker to primary when other backup brokers are available in the ready state.  This can result in loss of messages and broker configuration.  Additionally, this can cause the previously ready backups to throw exceptions when connecting to the new primary:

Sep 20 10:17:18 itcm12 qpidd[10871]: 2012-09-20 10:17:18 [HA] critical Backup queue Queue1: Replication failed: Invalid position move, preceeds messages
Sep 20 10:17:18 itcm12 qpidd[10871]: 2012-09-20 10:17:18 [Protocol] error Unexpected exception: Invalid position move, preceeds messages
Sep 20 10:17:18 itcm12 qpidd[10871]: 2012-09-20 10:17:18 [Broker] error Connection 10.3.100.12:43837-10.3.100.105:9006 closed by error: Invalid position move, preceeds messages(501)

Version-Release number of selected component (if applicable):
Qpid 0.18

How reproducible:
100%

Steps to Reproduce:
1. Start a primary and backup broker
2. Inject messages into the primary and ensure messages replicate to backup
3. Restart the primary broker and manually re-promote to primary
  
Actual results:
Restarted broker becomes primary

Expected results:
Restarted broker refuses to become primary since at least one ready backup was discovered within some timeout

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Resolved] (QPID-4360) Non-ready HA broker can be incorrectly promoted

Posted by "Alan Conway (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Conway resolved QPID-4360.
-------------------------------

    Resolution: Fixed

Comitted on trunk

------------------------------------------------------------------------
r1394706 | aconway | 2012-10-05 14:21:45 -0400 (Fri, 05 Oct 2012) | 10 lines

QPID-4360: Non-ready HA broker can be incorrectly promoted to primary

A joining broker now attempts to contact all known members of the cluster and
check their status. If any brokers are in a state other than "joining" the
broker will refuse to promote. This will allow rgmanager to continue to try
addresses till it finds a ready brokers.

Note this reqiures ha-brokers-url to be a list of all known brokers, not a
virtual IP.  ha-public-url can still be a VIP.

------------------------------------------------------------------------

                
>  Non-ready HA broker can be incorrectly promoted
> ------------------------------------------------
>
>                 Key: QPID-4360
>                 URL: https://issues.apache.org/jira/browse/QPID-4360
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Clustering
>    Affects Versions: 0.18
>            Reporter: Alan Conway
>            Assignee: Alan Conway
>
> Description of problem:
> rgmanager can promote a non-ready backup HA broker to primary when other backup brokers are available in the ready state.  This can result in loss of messages and broker configuration.  Additionally, this can cause the previously ready backups to throw exceptions when connecting to the new primary:
> Sep 20 10:17:18 itcm12 qpidd[10871]: 2012-09-20 10:17:18 [HA] critical Backup queue Queue1: Replication failed: Invalid position move, preceeds messages
> Sep 20 10:17:18 itcm12 qpidd[10871]: 2012-09-20 10:17:18 [Protocol] error Unexpected exception: Invalid position move, preceeds messages
> Sep 20 10:17:18 itcm12 qpidd[10871]: 2012-09-20 10:17:18 [Broker] error Connection 10.3.100.12:43837-10.3.100.105:9006 closed by error: Invalid position move, preceeds messages(501)
> Version-Release number of selected component (if applicable):
> Qpid 0.18
> How reproducible:
> 100%
> Steps to Reproduce:
> 1. Start a primary and backup broker
> 2. Inject messages into the primary and ensure messages replicate to backup
> 3. Restart the primary broker and manually re-promote to primary
>   
> Actual results:
> Restarted broker becomes primary
> Expected results:
> Restarted broker refuses to become primary since at least one ready backup was discovered within some timeout

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Updated] (QPID-4360) Non-ready HA broker can be incorrectly promoted

Posted by "Justin Ross (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Justin Ross updated QPID-4360:
------------------------------

    Fix Version/s: 0.19
    
>  Non-ready HA broker can be incorrectly promoted
> ------------------------------------------------
>
>                 Key: QPID-4360
>                 URL: https://issues.apache.org/jira/browse/QPID-4360
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Clustering
>    Affects Versions: 0.18
>            Reporter: Alan Conway
>            Assignee: Alan Conway
>             Fix For: 0.19
>
>
> Description of problem:
> rgmanager can promote a non-ready backup HA broker to primary when other backup brokers are available in the ready state.  This can result in loss of messages and broker configuration.  Additionally, this can cause the previously ready backups to throw exceptions when connecting to the new primary:
> Sep 20 10:17:18 itcm12 qpidd[10871]: 2012-09-20 10:17:18 [HA] critical Backup queue Queue1: Replication failed: Invalid position move, preceeds messages
> Sep 20 10:17:18 itcm12 qpidd[10871]: 2012-09-20 10:17:18 [Protocol] error Unexpected exception: Invalid position move, preceeds messages
> Sep 20 10:17:18 itcm12 qpidd[10871]: 2012-09-20 10:17:18 [Broker] error Connection 10.3.100.12:43837-10.3.100.105:9006 closed by error: Invalid position move, preceeds messages(501)
> Version-Release number of selected component (if applicable):
> Qpid 0.18
> How reproducible:
> 100%
> Steps to Reproduce:
> 1. Start a primary and backup broker
> 2. Inject messages into the primary and ensure messages replicate to backup
> 3. Restart the primary broker and manually re-promote to primary
>   
> Actual results:
> Restarted broker becomes primary
> Expected results:
> Restarted broker refuses to become primary since at least one ready backup was discovered within some timeout

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Commented] (QPID-4360) Non-ready HA broker can be incorrectly promoted

Posted by "Alan Conway (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472689#comment-13472689 ] 

Alan Conway commented on QPID-4360:
-----------------------------------

There was a test bug in the initial checkin, fixed on trunk:

------------------------------------------------------------------------
r1396244 | aconway | 2012-10-09 15:52:24 -0400 (Tue, 09 Oct 2012) | 7 lines

QPID-4360: Fix test bug: Non-ready HA broker can be incorrectly promoted to primary.

Test test_delete_missing_response was failing with "cluster active, cannot promote".
- Fixed test bug: "fake" primary triggered "cannot promote".
- Backup: always create QueueReplicator if not already existing.
- Terminology change: "initial" queues -> "catch-up" queues.

------------------------------------------------------------------------
                
>  Non-ready HA broker can be incorrectly promoted
> ------------------------------------------------
>
>                 Key: QPID-4360
>                 URL: https://issues.apache.org/jira/browse/QPID-4360
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Clustering
>    Affects Versions: 0.18
>            Reporter: Alan Conway
>            Assignee: Alan Conway
>
> Description of problem:
> rgmanager can promote a non-ready backup HA broker to primary when other backup brokers are available in the ready state.  This can result in loss of messages and broker configuration.  Additionally, this can cause the previously ready backups to throw exceptions when connecting to the new primary:
> Sep 20 10:17:18 itcm12 qpidd[10871]: 2012-09-20 10:17:18 [HA] critical Backup queue Queue1: Replication failed: Invalid position move, preceeds messages
> Sep 20 10:17:18 itcm12 qpidd[10871]: 2012-09-20 10:17:18 [Protocol] error Unexpected exception: Invalid position move, preceeds messages
> Sep 20 10:17:18 itcm12 qpidd[10871]: 2012-09-20 10:17:18 [Broker] error Connection 10.3.100.12:43837-10.3.100.105:9006 closed by error: Invalid position move, preceeds messages(501)
> Version-Release number of selected component (if applicable):
> Qpid 0.18
> How reproducible:
> 100%
> Steps to Reproduce:
> 1. Start a primary and backup broker
> 2. Inject messages into the primary and ensure messages replicate to backup
> 3. Restart the primary broker and manually re-promote to primary
>   
> Actual results:
> Restarted broker becomes primary
> Expected results:
> Restarted broker refuses to become primary since at least one ready backup was discovered within some timeout

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org