You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Zhongda Zhao (Jira)" <ji...@apache.org> on 2022/02/24 07:58:00 UTC

[jira] [Commented] (KAFKA-4212) Add a key-value store that is a TTL persistent cache

    [ https://issues.apache.org/jira/browse/KAFKA-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497236#comment-17497236 ] 

Zhongda Zhao commented on KAFKA-4212:
-------------------------------------

We thought we could have used {{Materialized.withRetention}} for a normal {{KeyValueStore}} to have a compact change-log topic with matching {{retention.ms}} and a RocksDB with matching TTL in seconds.

Our use case: we want to use kafka-streams to count huge amount of records, based on different key-combinations, but on a monthly basis. Due to different lengths of each month, we didn't use timed window due to fixed window size (also have to deal with out-of-order records without deterministic grace period). Our current work-round is to make year-month of event time part of group key. For interactive query we just do prefix scan. It works for our purpose until we found out {{withRetention}} is not applicable to normal {{KeyValueStore}} and the change-log topic doesn't have matching retention either (the latter might be possible with Processor API).

That said, we prefer using "kafka-layer" configuration "retention" to have consistent change-log topic and underlying store settings. Implementation details like TTL in seconds from RocksDB can be hidden.

Any alternative solution suggestions for our use case are more than welcome.

> Add a key-value store that is a TTL persistent cache
> ----------------------------------------------------
>
>                 Key: KAFKA-4212
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4212
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>    Affects Versions: 0.10.0.1
>            Reporter: Elias Levy
>            Priority: Major
>              Labels: api
>
> Some jobs needs to maintain as state a large set of key-values for some period of time.  I.e. they need to maintain a TTL cache of values potentially larger than memory. 
> Currently Kafka Streams provides non-windowed and windowed key-value stores.  Neither is an exact fit to this use case.  
> The {{RocksDBStore}}, a {{KeyValueStore}}, stores one value per key as required, but does not support expiration.  The TTL option of RocksDB is explicitly not used.
> The {{RocksDBWindowsStore}}, a {{WindowsStore}}, can expire items via segment dropping, but it stores multiple items per key, based on their timestamp.  But this store can be repurposed as a cache by fetching the items in reverse chronological order and returning the first item found.
> KAFKA-2594 introduced a fixed-capacity in-memory LRU caching store, but here we desire a variable-capacity memory-overflowing TTL caching store.
> Although {{RocksDBWindowsStore}} can be repurposed as a cache, it would be useful to have an official and proper TTL cache API and implementation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)