You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Zhongda Zhao (Jira)" <ji...@apache.org> on 2022/02/24 09:27:00 UTC

[jira] [Comment Edited] (KAFKA-4212) Add a key-value store that is a TTL persistent cache

    [ https://issues.apache.org/jira/browse/KAFKA-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497236#comment-17497236 ] 

Zhongda Zhao edited comment on KAFKA-4212 at 2/24/22, 9:26 AM:
---------------------------------------------------------------

Through: https://users.kafka.apache.narkive.com/DLplc75h/keyvaluestore-implementation-that-allows-retention-policy

We thought we could have used {{Materialized.withRetention}} for a normal {{KeyValueStore}} to have a compact change-log topic with matching {{retention.ms}} and a RocksDB with matching TTL in seconds.

Our use case: we want to use kafka-streams to count huge amount of records, based on different key-combinations, but on a monthly basis. Due to different lengths of each month, we didn't use timed window due to fixed window size (also have to deal with out-of-order records without deterministic grace period). Our current work-round is to make year-month of event time part of group key. For interactive query we just do prefix scan. It works for our purpose until we found out {{withRetention}} is not applicable to normal {{KeyValueStore}} and the change-log topic doesn't have matching retention either (the latter might be possible with Processor API).

That said, we prefer using "kafka-layer" configuration "retention" to have consistent change-log topic and underlying store settings. Implementation details like TTL in seconds from RocksDB can be hidden.

Any alternative solution suggestions for our use case are more than welcome.


was (Author: kenix):
We thought we could have used {{Materialized.withRetention}} for a normal {{KeyValueStore}} to have a compact change-log topic with matching {{retention.ms}} and a RocksDB with matching TTL in seconds.

Our use case: we want to use kafka-streams to count huge amount of records, based on different key-combinations, but on a monthly basis. Due to different lengths of each month, we didn't use timed window due to fixed window size (also have to deal with out-of-order records without deterministic grace period). Our current work-round is to make year-month of event time part of group key. For interactive query we just do prefix scan. It works for our purpose until we found out {{withRetention}} is not applicable to normal {{KeyValueStore}} and the change-log topic doesn't have matching retention either (the latter might be possible with Processor API).

That said, we prefer using "kafka-layer" configuration "retention" to have consistent change-log topic and underlying store settings. Implementation details like TTL in seconds from RocksDB can be hidden.

Any alternative solution suggestions for our use case are more than welcome.

> Add a key-value store that is a TTL persistent cache
> ----------------------------------------------------
>
>                 Key: KAFKA-4212
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4212
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>    Affects Versions: 0.10.0.1
>            Reporter: Elias Levy
>            Priority: Major
>              Labels: api
>
> Some jobs needs to maintain as state a large set of key-values for some period of time.  I.e. they need to maintain a TTL cache of values potentially larger than memory. 
> Currently Kafka Streams provides non-windowed and windowed key-value stores.  Neither is an exact fit to this use case.  
> The {{RocksDBStore}}, a {{KeyValueStore}}, stores one value per key as required, but does not support expiration.  The TTL option of RocksDB is explicitly not used.
> The {{RocksDBWindowsStore}}, a {{WindowsStore}}, can expire items via segment dropping, but it stores multiple items per key, based on their timestamp.  But this store can be repurposed as a cache by fetching the items in reverse chronological order and returning the first item found.
> KAFKA-2594 introduced a fixed-capacity in-memory LRU caching store, but here we desire a variable-capacity memory-overflowing TTL caching store.
> Although {{RocksDBWindowsStore}} can be repurposed as a cache, it would be useful to have an official and proper TTL cache API and implementation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)