You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Alan Conway (JIRA)" <ji...@apache.org> on 2012/10/29 21:32:12 UTC

[jira] [Created] (QPID-4402) HA QMF events can be out of order.

Alan Conway created QPID-4402:
---------------------------------

             Summary: HA QMF events can be out of order.
                 Key: QPID-4402
                 URL: https://issues.apache.org/jira/browse/QPID-4402
             Project: Qpid
          Issue Type: Bug
          Components: C++ Clustering
    Affects Versions: 0.18
            Reporter: Alan Conway
            Assignee: Alan Conway


With the new replication-based clustering in 0.18 MRG-M, it is possible for the replication to hang if the QMF events arrive in the wrong order.  I am running the following test that generates the hanging:

- Start a client with 2 threads
- Each thread creates its own Connection, Session, and a Receiver using the address "someQueue; {create:always, node: {x-declare: {auto-delete:True}}}"
- Run a loop like this (pseudocode):

while(receiver.get(message)) {
  // do stuff

  if at least 5 seconds have passed {
    connection.close();
    reconnectAndRecreateReceiver();
    receiver.setCapacity(1000);
  }
}

During this loop, the 2 threads will disconnect and reconnect every 5 seconds.  When connecting, 1 of them will create a queue.  When disconnecting, the queue will be deleted.  At some point, the queue creation event will possibly arrive at the backup broker before the queue deletion event (i.e. in the wrong order) because there is no lock that governs when queue creation/deletion events are emitted.  When this happens, the backup broker doesn't subscribe to the primary to replicate the queue in question, and things hang.

This is not strictly a HA problem, any QMF client may receive incorrectly ordered events. It comes up in the HA context because QMF events are used heavily by HA for replication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Resolved] (QPID-4402) HA QMF events can be out of order.

Posted by "Alan Conway (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Conway resolved QPID-4402.
-------------------------------

    Resolution: Duplicate

Duplicate of QPID-4394
                
> HA QMF events can be out of order.
> ----------------------------------
>
>                 Key: QPID-4402
>                 URL: https://issues.apache.org/jira/browse/QPID-4402
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Clustering
>    Affects Versions: 0.18
>            Reporter: Alan Conway
>            Assignee: Alan Conway
>
> With the new replication-based clustering in 0.18 MRG-M, it is possible for the replication to hang if the QMF events arrive in the wrong order.  I am running the following test that generates the hanging:
> - Start a client with 2 threads
> - Each thread creates its own Connection, Session, and a Receiver using the address "someQueue; {create:always, node: {x-declare: {auto-delete:True}}}"
> - Run a loop like this (pseudocode):
> while(receiver.get(message)) {
>   // do stuff
>   if at least 5 seconds have passed {
>     connection.close();
>     reconnectAndRecreateReceiver();
>     receiver.setCapacity(1000);
>   }
> }
> During this loop, the 2 threads will disconnect and reconnect every 5 seconds.  When connecting, 1 of them will create a queue.  When disconnecting, the queue will be deleted.  At some point, the queue creation event will possibly arrive at the backup broker before the queue deletion event (i.e. in the wrong order) because there is no lock that governs when queue creation/deletion events are emitted.  When this happens, the backup broker doesn't subscribe to the primary to replicate the queue in question, and things hang.
> This is not strictly a HA problem, any QMF client may receive incorrectly ordered events. It comes up in the HA context because QMF events are used heavily by HA for replication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org