You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Jason Dillaman (JIRA)" <ji...@apache.org> on 2012/09/05 00:07:08 UTC

[jira] [Created] (QPID-4285) HA backups continuously disconnect / re-sync after attempting to replicate a deleted queue

Jason Dillaman created QPID-4285:
------------------------------------

             Summary: HA backups continuously disconnect / re-sync after attempting to replicate a deleted queue
                 Key: QPID-4285
                 URL: https://issues.apache.org/jira/browse/QPID-4285
             Project: Qpid
          Issue Type: Bug
          Components: C++ Broker
    Affects Versions: 0.18
            Reporter: Jason Dillaman


Running qmf-stat on the primary broker shows that auto-delete queue XYZ exists, but running drain against the queue indicates that the queue does not really exist.  QMF is out-of-sync with the true state of the queue and as a result, the deleted queue was replicated to the backup broker.  When the backup attempted to subscribe to the queue, it received an error that the queue was deleted which results in the backup disconnecting/reconnecting to the primary and re-attempting the state replication.

Sample log output from backup:

Sep  4 14:51:26 itcm13 qpidd[10392]: 2012-09-04 14:51:26 [System] error resource-deleted: Queue XYZ has been deleted. (qpid/broker/Queue.cpp:1787)
Sep  4 14:51:26 itcm13 qpidd[10392]: 2012-09-04 14:51:26 [Broker] info Inter-broker link disconnected from 10.3.100.105:9006 Closed by peer
Sep  4 14:51:28 itcm13 qpidd[10392]: 2012-09-04 14:51:28 [System] info Connecting: 10.3.100.105:9006


Sample log output from the primary:

Sep  4 14:18:15 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:15 [HA] debug Primary: Known backup connected: host3:9006(ready)
Sep  4 14:18:15 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:15 [HA] debug Broker: Membership add: host3:9006(ready)
Sep  4 14:18:15 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:15 [HA] info Broker: Membership changed: host3:9006(ready) system-node1a-cluster:9006(recovering)
Sep  4 14:18:16 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:16 [System] debug DISCONNECTED [10.3.100.105:9006-10.3.100.13:19841]
Sep  4 14:18:16 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:16 [HA] debug Primary: Backup disconnected: host3:9006(ready)
Sep  4 14:18:16 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:16 [HA] debug Broker: Membership remove: 4ddb3222-e66d-4e6a-8c87-14e1a37332cf
Sep  4 14:18:16 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:16 [HA] info Broker: Membership changed: system-node1a-cluster:9006(recovering)


Backup bracktrace:

#0  qpid::broker::Queue::checkNotDeleted (this=0x4d97f70, c=<value optimized out>) at qpid/broker/Queue.cpp:1787
#1  0x0000003e1dbf67e4 in qpid::broker::Queue::getNextMessage (this=0x4d97f70, m=..., c=...) at qpid/broker/Queue.cpp:385
#2  0x0000003e1dbf688e in qpid::broker::Queue::dispatch (this=<value optimized out>, c=...) at qpid/broker/Queue.cpp:510
#3  0x00007f15139e62ba in qpid::ha::ReplicatingSubscription::getNext (q=..., from=..., result=...) at qpid/ha/ReplicatingSubscription.cpp:116
#4  0x00007f15139e4ebe in qpid::ha::QueueReplicator::initializeBridge (this=0x3868af0, bridge=..., sessionHandler=<value optimized out>) at qpid/ha/QueueReplicator.cpp:121




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Commented] (QPID-4285) HA backups continuously disconnect / re-sync after attempting to replicate a deleted queue

Posted by "Alan Conway (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13468105#comment-13468105 ] 

Alan Conway commented on QPID-4285:
-----------------------------------

Comitted to trunK:

 ------------------------------------------------------------------------
r1393201 | aconway | 2012-10-02 17:46:50 -0400 (Tue, 02 Oct 2012) | 9 lines

QPID-4285: HA backups continuously disconnect / re-sync after attempting to replicate a deleted queue

Fixes queues getting into a partially deleted state: previously when a broker was
promoted, it did not clean up it's QueueReplicators. The QueueReplicators keep a
shared_ptr to the Queue so this kept Queues in memory after they were destroyed.  It also
kept them in QMF, since the management object is unregistered in the destructor.

This patch cleans up properly on promotion.

------------------------------------------------------------------------
                
> HA backups continuously disconnect / re-sync after attempting to replicate a deleted queue
> ------------------------------------------------------------------------------------------
>
>                 Key: QPID-4285
>                 URL: https://issues.apache.org/jira/browse/QPID-4285
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker
>    Affects Versions: 0.18
>            Reporter: Jason Dillaman
>
> Running qmf-stat on the primary broker shows that auto-delete queue XYZ exists, but running drain against the queue indicates that the queue does not really exist.  QMF is out-of-sync with the true state of the queue and as a result, the deleted queue was replicated to the backup broker.  When the backup attempted to subscribe to the queue, it received an error that the queue was deleted which results in the backup disconnecting/reconnecting to the primary and re-attempting the state replication.
> Sample log output from backup:
> Sep  4 14:51:26 itcm13 qpidd[10392]: 2012-09-04 14:51:26 [System] error resource-deleted: Queue XYZ has been deleted. (qpid/broker/Queue.cpp:1787)
> Sep  4 14:51:26 itcm13 qpidd[10392]: 2012-09-04 14:51:26 [Broker] info Inter-broker link disconnected from 10.3.100.105:9006 Closed by peer
> Sep  4 14:51:28 itcm13 qpidd[10392]: 2012-09-04 14:51:28 [System] info Connecting: 10.3.100.105:9006
> Sample log output from the primary:
> Sep  4 14:18:15 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:15 [HA] debug Primary: Known backup connected: host3:9006(ready)
> Sep  4 14:18:15 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:15 [HA] debug Broker: Membership add: host3:9006(ready)
> Sep  4 14:18:15 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:15 [HA] info Broker: Membership changed: host3:9006(ready) system-node1a-cluster:9006(recovering)
> Sep  4 14:18:16 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:16 [System] debug DISCONNECTED [10.3.100.105:9006-10.3.100.13:19841]
> Sep  4 14:18:16 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:16 [HA] debug Primary: Backup disconnected: host3:9006(ready)
> Sep  4 14:18:16 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:16 [HA] debug Broker: Membership remove: 4ddb3222-e66d-4e6a-8c87-14e1a37332cf
> Sep  4 14:18:16 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:16 [HA] info Broker: Membership changed: system-node1a-cluster:9006(recovering)
> Backup bracktrace:
> #0  qpid::broker::Queue::checkNotDeleted (this=0x4d97f70, c=<value optimized out>) at qpid/broker/Queue.cpp:1787
> #1  0x0000003e1dbf67e4 in qpid::broker::Queue::getNextMessage (this=0x4d97f70, m=..., c=...) at qpid/broker/Queue.cpp:385
> #2  0x0000003e1dbf688e in qpid::broker::Queue::dispatch (this=<value optimized out>, c=...) at qpid/broker/Queue.cpp:510
> #3  0x00007f15139e62ba in qpid::ha::ReplicatingSubscription::getNext (q=..., from=..., result=...) at qpid/ha/ReplicatingSubscription.cpp:116
> #4  0x00007f15139e4ebe in qpid::ha::QueueReplicator::initializeBridge (this=0x3868af0, bridge=..., sessionHandler=<value optimized out>) at qpid/ha/QueueReplicator.cpp:121

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Commented] (QPID-4285) HA backups continuously disconnect / re-sync after attempting to replicate a deleted queue

Posted by "Alan Conway (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467954#comment-13467954 ] 

Alan Conway commented on QPID-4285:
-----------------------------------

Committed to trunk
------------------------------------------------------------------------
r1393089 | aconway | 2012-10-02 14:41:41 -0400 (Tue, 02 Oct 2012) | 12 lines

QPID-4285: HA backups continuously disconnect / re-sync after attempting to replicate a deleted queue. (Based on patch by Jason Dillama)

This does not directly tackle the origin of the problem but extends Jasons's patch since
it addresses something we had to fix anyway: "leaking" queues and exchanges. It does 2 things.

1. enabled hideDeletedError on all subscription objects used by HA
This suppress the troublesome exception with a harmless no-op

2. Delete queues/exchanges missing from responses (based on Jasons patch)
Fix the "leak" of queues and exchanges possible when an object replicated
to a backup is deleted from the newn primary before the backup connects.

------------------------------------------------------------------------

                
> HA backups continuously disconnect / re-sync after attempting to replicate a deleted queue
> ------------------------------------------------------------------------------------------
>
>                 Key: QPID-4285
>                 URL: https://issues.apache.org/jira/browse/QPID-4285
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker
>    Affects Versions: 0.18
>            Reporter: Jason Dillaman
>
> Running qmf-stat on the primary broker shows that auto-delete queue XYZ exists, but running drain against the queue indicates that the queue does not really exist.  QMF is out-of-sync with the true state of the queue and as a result, the deleted queue was replicated to the backup broker.  When the backup attempted to subscribe to the queue, it received an error that the queue was deleted which results in the backup disconnecting/reconnecting to the primary and re-attempting the state replication.
> Sample log output from backup:
> Sep  4 14:51:26 itcm13 qpidd[10392]: 2012-09-04 14:51:26 [System] error resource-deleted: Queue XYZ has been deleted. (qpid/broker/Queue.cpp:1787)
> Sep  4 14:51:26 itcm13 qpidd[10392]: 2012-09-04 14:51:26 [Broker] info Inter-broker link disconnected from 10.3.100.105:9006 Closed by peer
> Sep  4 14:51:28 itcm13 qpidd[10392]: 2012-09-04 14:51:28 [System] info Connecting: 10.3.100.105:9006
> Sample log output from the primary:
> Sep  4 14:18:15 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:15 [HA] debug Primary: Known backup connected: host3:9006(ready)
> Sep  4 14:18:15 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:15 [HA] debug Broker: Membership add: host3:9006(ready)
> Sep  4 14:18:15 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:15 [HA] info Broker: Membership changed: host3:9006(ready) system-node1a-cluster:9006(recovering)
> Sep  4 14:18:16 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:16 [System] debug DISCONNECTED [10.3.100.105:9006-10.3.100.13:19841]
> Sep  4 14:18:16 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:16 [HA] debug Primary: Backup disconnected: host3:9006(ready)
> Sep  4 14:18:16 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:16 [HA] debug Broker: Membership remove: 4ddb3222-e66d-4e6a-8c87-14e1a37332cf
> Sep  4 14:18:16 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:16 [HA] info Broker: Membership changed: system-node1a-cluster:9006(recovering)
> Backup bracktrace:
> #0  qpid::broker::Queue::checkNotDeleted (this=0x4d97f70, c=<value optimized out>) at qpid/broker/Queue.cpp:1787
> #1  0x0000003e1dbf67e4 in qpid::broker::Queue::getNextMessage (this=0x4d97f70, m=..., c=...) at qpid/broker/Queue.cpp:385
> #2  0x0000003e1dbf688e in qpid::broker::Queue::dispatch (this=<value optimized out>, c=...) at qpid/broker/Queue.cpp:510
> #3  0x00007f15139e62ba in qpid::ha::ReplicatingSubscription::getNext (q=..., from=..., result=...) at qpid/ha/ReplicatingSubscription.cpp:116
> #4  0x00007f15139e4ebe in qpid::ha::QueueReplicator::initializeBridge (this=0x3868af0, bridge=..., sessionHandler=<value optimized out>) at qpid/ha/QueueReplicator.cpp:121

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Resolved] (QPID-4285) HA backups continuously disconnect / re-sync after attempting to replicate a deleted queue

Posted by "Alan Conway (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Conway resolved QPID-4285.
-------------------------------

       Resolution: Fixed
    Fix Version/s: 0.19
         Assignee: Alan Conway
    
> HA backups continuously disconnect / re-sync after attempting to replicate a deleted queue
> ------------------------------------------------------------------------------------------
>
>                 Key: QPID-4285
>                 URL: https://issues.apache.org/jira/browse/QPID-4285
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker
>    Affects Versions: 0.18
>            Reporter: Jason Dillaman
>            Assignee: Alan Conway
>             Fix For: 0.19
>
>
> Running qmf-stat on the primary broker shows that auto-delete queue XYZ exists, but running drain against the queue indicates that the queue does not really exist.  QMF is out-of-sync with the true state of the queue and as a result, the deleted queue was replicated to the backup broker.  When the backup attempted to subscribe to the queue, it received an error that the queue was deleted which results in the backup disconnecting/reconnecting to the primary and re-attempting the state replication.
> Sample log output from backup:
> Sep  4 14:51:26 itcm13 qpidd[10392]: 2012-09-04 14:51:26 [System] error resource-deleted: Queue XYZ has been deleted. (qpid/broker/Queue.cpp:1787)
> Sep  4 14:51:26 itcm13 qpidd[10392]: 2012-09-04 14:51:26 [Broker] info Inter-broker link disconnected from 10.3.100.105:9006 Closed by peer
> Sep  4 14:51:28 itcm13 qpidd[10392]: 2012-09-04 14:51:28 [System] info Connecting: 10.3.100.105:9006
> Sample log output from the primary:
> Sep  4 14:18:15 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:15 [HA] debug Primary: Known backup connected: host3:9006(ready)
> Sep  4 14:18:15 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:15 [HA] debug Broker: Membership add: host3:9006(ready)
> Sep  4 14:18:15 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:15 [HA] info Broker: Membership changed: host3:9006(ready) system-node1a-cluster:9006(recovering)
> Sep  4 14:18:16 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:16 [System] debug DISCONNECTED [10.3.100.105:9006-10.3.100.13:19841]
> Sep  4 14:18:16 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:16 [HA] debug Primary: Backup disconnected: host3:9006(ready)
> Sep  4 14:18:16 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:16 [HA] debug Broker: Membership remove: 4ddb3222-e66d-4e6a-8c87-14e1a37332cf
> Sep  4 14:18:16 system-node1a-cluster qpidd[8397]: 2012-09-04 14:18:16 [HA] info Broker: Membership changed: system-node1a-cluster:9006(recovering)
> Backup bracktrace:
> #0  qpid::broker::Queue::checkNotDeleted (this=0x4d97f70, c=<value optimized out>) at qpid/broker/Queue.cpp:1787
> #1  0x0000003e1dbf67e4 in qpid::broker::Queue::getNextMessage (this=0x4d97f70, m=..., c=...) at qpid/broker/Queue.cpp:385
> #2  0x0000003e1dbf688e in qpid::broker::Queue::dispatch (this=<value optimized out>, c=...) at qpid/broker/Queue.cpp:510
> #3  0x00007f15139e62ba in qpid::ha::ReplicatingSubscription::getNext (q=..., from=..., result=...) at qpid/ha/ReplicatingSubscription.cpp:116
> #4  0x00007f15139e4ebe in qpid::ha::QueueReplicator::initializeBridge (this=0x3868af0, bridge=..., sessionHandler=<value optimized out>) at qpid/ha/QueueReplicator.cpp:121

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org