You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "A. Sophie Blee-Goldman (Jira)" <ji...@apache.org> on 2022/11/24 06:28:00 UTC
[jira] [Commented] (KAFKA-14415) ThreadCache is getting slower with every additional state store
[ https://issues.apache.org/jira/browse/KAFKA-14415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638119#comment-17638119 ]
A. Sophie Blee-Goldman commented on KAFKA-14415:
------------------------------------------------
Nice find, I bet the assumption when this was first implemented was that the number of named caches/actual state stores would be pretty low, but the reality is many apps can easily grow to the point of this many sizeBytes() invocations having nontrivial overhead...so yeah, good catch :)
> ThreadCache is getting slower with every additional state store
> ---------------------------------------------------------------
>
> Key: KAFKA-14415
> URL: https://issues.apache.org/jira/browse/KAFKA-14415
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Reporter: Lucas Brutschy
> Assignee: Lucas Brutschy
> Priority: Major
>
> There are a few lines in `ThreadCache` that I think should be optimized. `sizeBytes` is called at least once, and potentially many times in every `put` and is linear in the number of caches (= number of state stores, so typically proportional to number of tasks). That means, with every additional task, every put gets a little slower.Compare the throughput of TIME_ROCKS on trunk (green graph):
> [http://kstreams-benchmark-results.s3-website-us-west-2.amazonaws.com/experiments/stateheavy-3-5-3-4-0-51b7eb7937-jenkins-20221113214104-streamsbench/]
> This is the throughput of TIME_ROCKS is 20% higher when a constant time `sizeBytes` implementation is used:
> [http://kstreams-benchmark-results.s3-website-us-west-2.amazonaws.com/experiments/stateheavy-3-5-LUCASCOMPARE-lucas-20221122140846-streamsbench/]
> The same seems to apply for the MEM backend (initial throughput >8000 instead of 6000), however, I cannot run the same benchmark here because the memory is filled too quickly.
> [http://kstreams-benchmark-results.s3-website-us-west-2.amazonaws.com/experiments/stateheavy-3-5-LUCASSTATE-lucas-20221121231632-streamsbench/]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)