You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Matthias J. Sax (JIRA)" <ji...@apache.org> on 2019/02/16 20:36:00 UTC
[jira] [Created] (KAFKA-7934) Optimize restore for windowed and
session stores
Matthias J. Sax created KAFKA-7934:
--------------------------------------
Summary: Optimize restore for windowed and session stores
Key: KAFKA-7934
URL: https://issues.apache.org/jira/browse/KAFKA-7934
Project: Kafka
Issue Type: Improvement
Components: streams
Reporter: Matthias J. Sax
During state restore of window/session stores, the changelog topic is scanned from the oldest entries to the newest entry. This happen on a record-per-record basis or in record batches.
During this process, new segments are created while time advances (base on the record timestamp of the record that are restored). However, depending on the retention time, we might expire segments during restore process later again. This is wasteful. Because retention time is based on the largest timestamp per partition, it is possible to compute a bound for live and expired segment upfront (assuming that we know the largest timestamp). This way, during restore, we could avoid creating segments that are expired later anyway and skip over all corresponding records.
The problem is, that we don't know the largest timestamp per partition. Maybe the broker timestamp index could help to provide an approximation for this value.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)