You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Jason Dillaman (JIRA)" <ji...@apache.org> on 2012/09/05 22:27:08 UTC
[jira] [Created] (QPID-4286) QMF queries for HA replication take
too long to process
Jason Dillaman created QPID-4286:
------------------------------------
Summary: QMF queries for HA replication take too long to process
Key: QPID-4286
URL: https://issues.apache.org/jira/browse/QPID-4286
Project: Qpid
Issue Type: Bug
Components: C++ Broker
Affects Versions: 0.18
Reporter: Jason Dillaman
In an HA broker with approximately 12,000 queues, it takes roughly 10-14 seconds for the the first QMF response fragment to arrive. While the QMF management agent is collecting the response, all other QMF-related functionality is blocked -- which will block any thread that raises a QMF event.
Not only will this result in clients getting disconnected from the broker due to worker threads being blocked by QMF (either due to missed heartbeats in an extreme case or from the 2 second handshake timeout), this also results in the HA backup's federated link getting disconnected due to missed heartbeats when the link heartbeat interval is set to a low value.
If the HA backup loses its connection, it only exacerbates the issue since it will reconnect and re-query the QMF data that made it lose its connection in the first place.
Recommend that QMF events not be blocked by a global management agent lock and also recommend that potentially long-running QMF queries be separated from the worker thread that initiated them to prevent a heartbeat timeout.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org
[jira] [Updated] (QPID-4286) QMF queries for HA replication take
too long to process
Posted by "Jason Dillaman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/QPID-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Dillaman updated QPID-4286:
---------------------------------
Attachment: qpid-4286.patch
Quick patch against 0.18 branch to utilize a unique lock for v1 QMF events instead of the standard 'userLock' and to enqueue v2 QMF commands for async processing to prevent blocking all available worker threads.
> QMF queries for HA replication take too long to process
> -------------------------------------------------------
>
> Key: QPID-4286
> URL: https://issues.apache.org/jira/browse/QPID-4286
> Project: Qpid
> Issue Type: Bug
> Components: C++ Broker
> Affects Versions: 0.18
> Reporter: Jason Dillaman
> Attachments: qpid-4286.patch
>
>
> In an HA broker with approximately 12,000 queues, it takes roughly 10-14 seconds for the the first QMF response fragment to arrive. While the QMF management agent is collecting the response, all other QMF-related functionality is blocked -- which will block any thread that raises a QMF event.
> Not only will this result in clients getting disconnected from the broker due to worker threads being blocked by QMF (either due to missed heartbeats in an extreme case or from the 2 second handshake timeout), this also results in the HA backup's federated link getting disconnected due to missed heartbeats when the link heartbeat interval is set to a low value.
> If the HA backup loses its connection, it only exacerbates the issue since it will reconnect and re-query the QMF data that made it lose its connection in the first place.
> Recommend that QMF events not be blocked by a global management agent lock and also recommend that potentially long-running QMF queries be separated from the worker thread that initiated them to prevent a heartbeat timeout.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org
[jira] [Updated] (QPID-4286) QMF queries for HA replication take
too long to process
Posted by "Alan Conway (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/QPID-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Conway updated QPID-4286:
------------------------------
Attachment: qpid-4286-fixes.patch
This is Jason's patch with some additional fixes.
> QMF queries for HA replication take too long to process
> -------------------------------------------------------
>
> Key: QPID-4286
> URL: https://issues.apache.org/jira/browse/QPID-4286
> Project: Qpid
> Issue Type: Bug
> Components: C++ Broker
> Affects Versions: 0.18
> Reporter: Jason Dillaman
> Assignee: Alan Conway
> Attachments: qpid-4286-fixes.patch, qpid-4286.patch
>
>
> In an HA broker with approximately 12,000 queues, it takes roughly 10-14 seconds for the the first QMF response fragment to arrive. While the QMF management agent is collecting the response, all other QMF-related functionality is blocked -- which will block any thread that raises a QMF event.
> Not only will this result in clients getting disconnected from the broker due to worker threads being blocked by QMF (either due to missed heartbeats in an extreme case or from the 2 second handshake timeout), this also results in the HA backup's federated link getting disconnected due to missed heartbeats when the link heartbeat interval is set to a low value.
> If the HA backup loses its connection, it only exacerbates the issue since it will reconnect and re-query the QMF data that made it lose its connection in the first place.
> Recommend that QMF events not be blocked by a global management agent lock and also recommend that potentially long-running QMF queries be separated from the worker thread that initiated them to prevent a heartbeat timeout.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org
[jira] [Resolved] (QPID-4286) QMF queries for HA replication take
too long to process
Posted by "Alan Conway (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/QPID-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Conway resolved QPID-4286.
-------------------------------
Resolution: Fixed
Committed Jason's patch on trunk:
------------------------------------------------------------------------
r1398530 | aconway | 2012-10-15 17:35:38 -0400 (Mon, 15 Oct 2012) | 6 lines
MQPID-4286: QMF queries for HA replication take too long to process (Jason Dillaman)
Rework ManagementAgent locks, get rid of shared buffers that were points of contention.
Minor log message improvements in ha code.
------------------------------------------------------------------------
> QMF queries for HA replication take too long to process
> -------------------------------------------------------
>
> Key: QPID-4286
> URL: https://issues.apache.org/jira/browse/QPID-4286
> Project: Qpid
> Issue Type: Bug
> Components: C++ Broker
> Affects Versions: 0.18
> Reporter: Jason Dillaman
> Assignee: Alan Conway
> Attachments: qpid-4286-fixes.patch, qpid-4286.patch
>
>
> In an HA broker with approximately 12,000 queues, it takes roughly 10-14 seconds for the the first QMF response fragment to arrive. While the QMF management agent is collecting the response, all other QMF-related functionality is blocked -- which will block any thread that raises a QMF event.
> Not only will this result in clients getting disconnected from the broker due to worker threads being blocked by QMF (either due to missed heartbeats in an extreme case or from the 2 second handshake timeout), this also results in the HA backup's federated link getting disconnected due to missed heartbeats when the link heartbeat interval is set to a low value.
> If the HA backup loses its connection, it only exacerbates the issue since it will reconnect and re-query the QMF data that made it lose its connection in the first place.
> Recommend that QMF events not be blocked by a global management agent lock and also recommend that potentially long-running QMF queries be separated from the worker thread that initiated them to prevent a heartbeat timeout.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org
[jira] [Assigned] (QPID-4286) QMF queries for HA replication take
too long to process
Posted by "Alan Conway (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/QPID-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Conway reassigned QPID-4286:
---------------------------------
Assignee: Alan Conway
> QMF queries for HA replication take too long to process
> -------------------------------------------------------
>
> Key: QPID-4286
> URL: https://issues.apache.org/jira/browse/QPID-4286
> Project: Qpid
> Issue Type: Bug
> Components: C++ Broker
> Affects Versions: 0.18
> Reporter: Jason Dillaman
> Assignee: Alan Conway
> Attachments: qpid-4286.patch
>
>
> In an HA broker with approximately 12,000 queues, it takes roughly 10-14 seconds for the the first QMF response fragment to arrive. While the QMF management agent is collecting the response, all other QMF-related functionality is blocked -- which will block any thread that raises a QMF event.
> Not only will this result in clients getting disconnected from the broker due to worker threads being blocked by QMF (either due to missed heartbeats in an extreme case or from the 2 second handshake timeout), this also results in the HA backup's federated link getting disconnected due to missed heartbeats when the link heartbeat interval is set to a low value.
> If the HA backup loses its connection, it only exacerbates the issue since it will reconnect and re-query the QMF data that made it lose its connection in the first place.
> Recommend that QMF events not be blocked by a global management agent lock and also recommend that potentially long-running QMF queries be separated from the worker thread that initiated them to prevent a heartbeat timeout.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org