You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Lucas Bradstreet (Jira)" <ji...@apache.org> on 2022/01/13 15:59:00 UTC

[jira] [Comment Edited] (KAFKA-13593) ThrottledChannelReaper slows broker shutdown by multiple seconds

    [ https://issues.apache.org/jira/browse/KAFKA-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475474#comment-17475474 ] 

Lucas Bradstreet edited comment on KAFKA-13593 at 1/13/22, 3:58 PM:
--------------------------------------------------------------------

Thanks for reporting [~stevenschlansker] . Funnily enough, I've been optimizing our test runtimes and recently merged a fix for this exact issue for the ClientQuotaManager and purgatory threads, see [https://github.com/apache/kafka/commit/0b9a8bac36f16b5397e9ec3a0441758e4b60a384.] 


was (Author: lucasbradstreet):
Thanks for reporting [~stevenschlansker] . Funnily enough I have been optimizing our test runtimes and recently merged a fix for this exact issue for the ClientQuotaManager and purgatory threads, see [https://github.com/apache/kafka/commit/0b9a8bac36f16b5397e9ec3a0441758e4b60a384.] 

> ThrottledChannelReaper slows broker shutdown by multiple seconds
> ----------------------------------------------------------------
>
>                 Key: KAFKA-13593
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13593
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 3.0.0
>            Reporter: Steven Schlansker
>            Priority: Minor
>
> We run an embedded KRaft broker in integration tests, to test that our Producer / Consumers are all hooked up and verify our overall pipeline.
> While trying to deliver CI speed improvements, we noticed that the majority of time for a small test is actually spent in Kafka broker shutdown. From the logs, it looks like the ClientQuotaManager has multiple ThrottledChannelReaper threads and each of them takes up to a second to shutdown.
> {code:java}
> 2022-01-12T15:26:31.932Z [main] INFO  kafka.log.LogManager - Shutdown complete.
> 2022-01-12T15:26:31.934Z [main] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Fetch]: Shutting down
> 2022-01-12T15:26:32.311Z [main] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Fetch]: Shutdown completed
> 2022-01-12T15:26:32.311Z [ThrottledChannelReaper-Fetch] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Fetch]: Stopped
> 2022-01-12T15:26:32.311Z [main] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Produce]: Shutting down
> 2022-01-12T15:26:33.312Z [ThrottledChannelReaper-Produce] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Produce]: Stopped
> 2022-01-12T15:26:33.312Z [main] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Produce]: Shutdown completed
> 2022-01-12T15:26:33.312Z [main] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Request]: Shutting down
> 2022-01-12T15:26:34.315Z [ThrottledChannelReaper-Request] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Request]: Stopped
> 2022-01-12T15:26:34.315Z [main] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Request]: Shutdown completed
> 2022-01-12T15:26:34.315Z [main] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-ControllerMutation]: Shutting down
> 2022-01-12T15:26:35.317Z [main] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-ControllerMutation]: Shutdown completed
> 2022-01-12T15:26:35.317Z [ThrottledChannelReaper-ControllerMutation] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-ControllerMutation]: Stopped{code}
> Inspecting the code, the ThrottledChannelReaper threads are marked as not interruptible, so ShutdownableThread does not interrupt the worker on shutdown. Unfortunately, the doWork method polls with a 1 second timeout. So you wait up to 1s for every type of quota, in this case for a total of almost 4s.
>  
> While this is not a problem for production Kafka brokers, where instances are expected to stay up for months, in tests that expect to spin up and down it is easily noticed and adds real overhead to CI.
>  
> Suggested possible remediations:
>  * Allow ThrottledChannelReaper to be interrupted by ShutdownableThread
>  * ThrottledChannelReaper overrides initiateShutdown and after calling {{super.initiateShutdown}} then enqueues a {{null}} element on the delayQueue to force worker thread to notice shutdown state
>  * Reduce 1s poll timeout to something small (carries overhead penalty for all users though, so this is less desirable), or make it configurable so we can set it to e.g. 10ms in unit tests



--
This message was sent by Atlassian Jira
(v8.20.1#820001)