You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/10/11 07:12:00 UTC
[jira] [Work logged] (AMQ-9107) Closing many consumers causes CPU to spike to 100%
[ https://issues.apache.org/jira/browse/AMQ-9107?focusedWorklogId=815473&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-815473 ]
ASF GitHub Bot logged work on AMQ-9107:
---------------------------------------
Author: ASF GitHub Bot
Created on: 11/Oct/22 07:11
Start Date: 11/Oct/22 07:11
Worklog Time Spent: 10m
Work Description: lucastetreault opened a new pull request, #908:
URL: https://github.com/apache/activemq/pull/908
Running a profiler while executing the sample code attached to [AMQ-9107](https://issues.apache.org/jira/browse/AMQ-9107) identified ManagedRegionBroker.removeConsumer as the bottleneck. The existing implementation loops over all the subscriptions to find the subscription for the consumer we want to close. When we have n consumers and we want to close them all this for loop is O(n^2) and when n is big enough it creates a serious performance issue. With 188,000 consumers we observe the CPU at 100% for ~40 minutes while all the connections are closed:
<img width="1217" alt="image" src="https://user-images.githubusercontent.com/7095337/195011857-a6971abb-b73c-41fd-bd88-9ab376388949.png">
After this PR, running the same test case we observe a spike in CPU of only one minute or less, similar to what it took to create the consumers:
<img width="968" alt="image" src="https://user-images.githubusercontent.com/7095337/195017869-c17c8b4a-fabc-4c2c-a909-6073955613a1.png">
I ran the full suite of tests and everything is passing.
Issue Time Tracking
-------------------
Worklog Id: (was: 815473)
Remaining Estimate: 0h
Time Spent: 10m
> Closing many consumers causes CPU to spike to 100%
> --------------------------------------------------
>
> Key: AMQ-9107
> URL: https://issues.apache.org/jira/browse/AMQ-9107
> Project: ActiveMQ
> Issue Type: Bug
> Affects Versions: 5.17.1, 5.16.5
> Reporter: Lucas Tétreault
> Assignee: Jean-Baptiste Onofré
> Priority: Major
> Attachments: example.zip, image-2022-10-07-00-12-39-657.png, image-2022-10-07-00-17-30-657.png
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> When there are many consumers (~188k) on a queue, closing them is incredibly expensive and causes the CPU to spike to 100% while the consumers are closed. Tested on an Amazon MQ mq.m5.large instance (2 vcpu, 8gb memory).
> I have attached a minimal recreation of the issue where the following happens:
> 1/ Open 100 connections.
> 2/ Create consumers as fast as we can on all of those connections until we hit at least 188k consumers.
> 3/ Sleep for 5 minutes so we can observe the CPU come back down after opening all those connections.
> 4/ Start closing consumers as fast as we can.
> 5/ After all consumers are closed, sleep for 5 minutes to observe the CPU come back down after closing all the connections.
>
> In this example it seems 5 minutes wasn't actually sufficient time for the CPU to come back down and the consumer and connection counts seem to hit 0 at the same time:
> !image-2022-10-07-00-12-39-657.png|width=757,height=353!
>
> In a previous test with more time sleeping after closing all the consumers we can see the CPU come back down before we close the connections.
> !image-2022-10-07-00-17-30-657.png|width=764,height=348!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)