You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Zhongda Zhao (Jira)" <ji...@apache.org> on 2022/02/24 07:58:00 UTC
[jira] [Commented] (KAFKA-4212) Add a key-value store that is a TTL persistent cache
[ https://issues.apache.org/jira/browse/KAFKA-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497236#comment-17497236 ]
Zhongda Zhao commented on KAFKA-4212:
-------------------------------------
We thought we could have used {{Materialized.withRetention}} for a normal {{KeyValueStore}} to have a compact change-log topic with matching {{retention.ms}} and a RocksDB with matching TTL in seconds.
Our use case: we want to use kafka-streams to count huge amount of records, based on different key-combinations, but on a monthly basis. Due to different lengths of each month, we didn't use timed window due to fixed window size (also have to deal with out-of-order records without deterministic grace period). Our current work-round is to make year-month of event time part of group key. For interactive query we just do prefix scan. It works for our purpose until we found out {{withRetention}} is not applicable to normal {{KeyValueStore}} and the change-log topic doesn't have matching retention either (the latter might be possible with Processor API).
That said, we prefer using "kafka-layer" configuration "retention" to have consistent change-log topic and underlying store settings. Implementation details like TTL in seconds from RocksDB can be hidden.
Any alternative solution suggestions for our use case are more than welcome.
> Add a key-value store that is a TTL persistent cache
> ----------------------------------------------------
>
> Key: KAFKA-4212
> URL: https://issues.apache.org/jira/browse/KAFKA-4212
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Affects Versions: 0.10.0.1
> Reporter: Elias Levy
> Priority: Major
> Labels: api
>
> Some jobs needs to maintain as state a large set of key-values for some period of time. I.e. they need to maintain a TTL cache of values potentially larger than memory.
> Currently Kafka Streams provides non-windowed and windowed key-value stores. Neither is an exact fit to this use case.
> The {{RocksDBStore}}, a {{KeyValueStore}}, stores one value per key as required, but does not support expiration. The TTL option of RocksDB is explicitly not used.
> The {{RocksDBWindowsStore}}, a {{WindowsStore}}, can expire items via segment dropping, but it stores multiple items per key, based on their timestamp. But this store can be repurposed as a cache by fetching the items in reverse chronological order and returning the first item found.
> KAFKA-2594 introduced a fixed-capacity in-memory LRU caching store, but here we desire a variable-capacity memory-overflowing TTL caching store.
> Although {{RocksDBWindowsStore}} can be repurposed as a cache, it would be useful to have an official and proper TTL cache API and implementation.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)