You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Guozhang Wang (Jira)" <ji...@apache.org> on 2020/01/07 23:36:00 UTC

[jira] [Assigned] (KAFKA-7821) Streams: default cache size can lose session windows in high-throughput deployment

     [ https://issues.apache.org/jira/browse/KAFKA-7821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Guozhang Wang reassigned KAFKA-7821:
------------------------------------

    Assignee:     (was: Guozhang Wang)

> Streams: default cache size can lose session windows in high-throughput deployment
> ----------------------------------------------------------------------------------
>
>                 Key: KAFKA-7821
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7821
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.10.2.1, 2.1.0
>            Reporter: Matthew Jarvie
>            Priority: Major
>
> We have observed that with a default cache size, a Streams aggregator will sometimes fail to find existing, open session windows while handling records. The effect is that it starts a new session and overwrites the old one and events fail to aggregate together.
> Our topology is fairly simple: We consume from a Kafka topic, group by keys, aggregate, then produce to another topic. Our aggregator is configured to use a window session strategy with an inactivity gap of 10 minutes and a retention period of 10 minutes. The system is deployed in production and handles about 250k messages per thread per minute (4 threads per application). The cache size is left default (10 MB).
> We worked around the issue by enlarging the cache (cache.max.bytes.buffering configuration parameter from 10 MB to 100MB) and no longer observe the issue at all. While troubleshooting, we noticed that older sessions would be the ones lost, so it seems like the cache is an LRU cache and is evicting windows before their inactivity time is up.
> This was originally observed in 10.2.1. We completed an upgrade to 2.1.0 and still observed the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)