You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Jason Dillaman (JIRA)" <ji...@apache.org> on 2012/09/05 22:27:08 UTC

[jira] [Created] (QPID-4286) QMF queries for HA replication take too long to process

Jason Dillaman created QPID-4286:
------------------------------------

             Summary: QMF queries for HA replication take too long to process
                 Key: QPID-4286
                 URL: https://issues.apache.org/jira/browse/QPID-4286
             Project: Qpid
          Issue Type: Bug
          Components: C++ Broker
    Affects Versions: 0.18
            Reporter: Jason Dillaman


In an HA broker with approximately 12,000 queues, it takes roughly 10-14 seconds for the the first QMF response fragment to arrive.  While the QMF management agent is collecting the response, all other QMF-related functionality is blocked  -- which will block any thread that raises a QMF event.  

Not only will this result in clients getting disconnected from the broker due to worker threads being blocked by QMF (either due to missed heartbeats in an extreme case or from the 2 second handshake timeout), this also results in the HA backup's federated link getting disconnected due to missed heartbeats when the link heartbeat interval is set to a low value.  

If the HA backup loses its connection, it only exacerbates the issue since it will reconnect and re-query the QMF data that made it lose its connection in the first place.  

Recommend that QMF events not be blocked by a global management agent lock and also recommend that potentially long-running QMF queries be separated from the worker thread that initiated them to prevent a heartbeat timeout.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Updated] (QPID-4286) QMF queries for HA replication take too long to process

Posted by "Jason Dillaman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Dillaman updated QPID-4286:
---------------------------------

    Attachment: qpid-4286.patch

Quick patch against 0.18 branch to utilize a unique lock for v1 QMF events instead of the standard 'userLock' and to enqueue v2 QMF commands for async processing to prevent blocking all available worker threads.
                
> QMF queries for HA replication take too long to process
> -------------------------------------------------------
>
>                 Key: QPID-4286
>                 URL: https://issues.apache.org/jira/browse/QPID-4286
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker
>    Affects Versions: 0.18
>            Reporter: Jason Dillaman
>         Attachments: qpid-4286.patch
>
>
> In an HA broker with approximately 12,000 queues, it takes roughly 10-14 seconds for the the first QMF response fragment to arrive.  While the QMF management agent is collecting the response, all other QMF-related functionality is blocked  -- which will block any thread that raises a QMF event.  
> Not only will this result in clients getting disconnected from the broker due to worker threads being blocked by QMF (either due to missed heartbeats in an extreme case or from the 2 second handshake timeout), this also results in the HA backup's federated link getting disconnected due to missed heartbeats when the link heartbeat interval is set to a low value.  
> If the HA backup loses its connection, it only exacerbates the issue since it will reconnect and re-query the QMF data that made it lose its connection in the first place.  
> Recommend that QMF events not be blocked by a global management agent lock and also recommend that potentially long-running QMF queries be separated from the worker thread that initiated them to prevent a heartbeat timeout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Updated] (QPID-4286) QMF queries for HA replication take too long to process

Posted by "Alan Conway (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Conway updated QPID-4286:
------------------------------

    Attachment: qpid-4286-fixes.patch

This is Jason's patch with some additional fixes.
                
> QMF queries for HA replication take too long to process
> -------------------------------------------------------
>
>                 Key: QPID-4286
>                 URL: https://issues.apache.org/jira/browse/QPID-4286
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker
>    Affects Versions: 0.18
>            Reporter: Jason Dillaman
>            Assignee: Alan Conway
>         Attachments: qpid-4286-fixes.patch, qpid-4286.patch
>
>
> In an HA broker with approximately 12,000 queues, it takes roughly 10-14 seconds for the the first QMF response fragment to arrive.  While the QMF management agent is collecting the response, all other QMF-related functionality is blocked  -- which will block any thread that raises a QMF event.  
> Not only will this result in clients getting disconnected from the broker due to worker threads being blocked by QMF (either due to missed heartbeats in an extreme case or from the 2 second handshake timeout), this also results in the HA backup's federated link getting disconnected due to missed heartbeats when the link heartbeat interval is set to a low value.  
> If the HA backup loses its connection, it only exacerbates the issue since it will reconnect and re-query the QMF data that made it lose its connection in the first place.  
> Recommend that QMF events not be blocked by a global management agent lock and also recommend that potentially long-running QMF queries be separated from the worker thread that initiated them to prevent a heartbeat timeout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Resolved] (QPID-4286) QMF queries for HA replication take too long to process

Posted by "Alan Conway (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Conway resolved QPID-4286.
-------------------------------

    Resolution: Fixed

Committed Jason's patch on trunk:
------------------------------------------------------------------------   
r1398530 | aconway | 2012-10-15 17:35:38 -0400 (Mon, 15 Oct 2012) | 6 lines

MQPID-4286: QMF queries for HA replication take too long to process (Jason Dillaman)

Rework ManagementAgent locks, get rid of shared buffers that were points of contention.

Minor log message improvements in ha code.

------------------------------------------------------------------------

                
> QMF queries for HA replication take too long to process
> -------------------------------------------------------
>
>                 Key: QPID-4286
>                 URL: https://issues.apache.org/jira/browse/QPID-4286
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker
>    Affects Versions: 0.18
>            Reporter: Jason Dillaman
>            Assignee: Alan Conway
>         Attachments: qpid-4286-fixes.patch, qpid-4286.patch
>
>
> In an HA broker with approximately 12,000 queues, it takes roughly 10-14 seconds for the the first QMF response fragment to arrive.  While the QMF management agent is collecting the response, all other QMF-related functionality is blocked  -- which will block any thread that raises a QMF event.  
> Not only will this result in clients getting disconnected from the broker due to worker threads being blocked by QMF (either due to missed heartbeats in an extreme case or from the 2 second handshake timeout), this also results in the HA backup's federated link getting disconnected due to missed heartbeats when the link heartbeat interval is set to a low value.  
> If the HA backup loses its connection, it only exacerbates the issue since it will reconnect and re-query the QMF data that made it lose its connection in the first place.  
> Recommend that QMF events not be blocked by a global management agent lock and also recommend that potentially long-running QMF queries be separated from the worker thread that initiated them to prevent a heartbeat timeout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Assigned] (QPID-4286) QMF queries for HA replication take too long to process

Posted by "Alan Conway (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Conway reassigned QPID-4286:
---------------------------------

    Assignee: Alan Conway
    
> QMF queries for HA replication take too long to process
> -------------------------------------------------------
>
>                 Key: QPID-4286
>                 URL: https://issues.apache.org/jira/browse/QPID-4286
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker
>    Affects Versions: 0.18
>            Reporter: Jason Dillaman
>            Assignee: Alan Conway
>         Attachments: qpid-4286.patch
>
>
> In an HA broker with approximately 12,000 queues, it takes roughly 10-14 seconds for the the first QMF response fragment to arrive.  While the QMF management agent is collecting the response, all other QMF-related functionality is blocked  -- which will block any thread that raises a QMF event.  
> Not only will this result in clients getting disconnected from the broker due to worker threads being blocked by QMF (either due to missed heartbeats in an extreme case or from the 2 second handshake timeout), this also results in the HA backup's federated link getting disconnected due to missed heartbeats when the link heartbeat interval is set to a low value.  
> If the HA backup loses its connection, it only exacerbates the issue since it will reconnect and re-query the QMF data that made it lose its connection in the first place.  
> Recommend that QMF events not be blocked by a global management agent lock and also recommend that potentially long-running QMF queries be separated from the worker thread that initiated them to prevent a heartbeat timeout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org