You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Matthias J. Sax (Jira)" <ji...@apache.org> on 2021/10/06 00:49:00 UTC
[jira] [Created] (KAFKA-13350) Handle task corrupted exception on a
per state store basis
Matthias J. Sax created KAFKA-13350:
---------------------------------------
Summary: Handle task corrupted exception on a per state store basis
Key: KAFKA-13350
URL: https://issues.apache.org/jira/browse/KAFKA-13350
Project: Kafka
Issue Type: Improvement
Components: streams
Reporter: Matthias J. Sax
When we hit an `OffsetOutOfRangeException` during restore, we close a tasks as dirty and retry the restore process from scratch. For this case, we wipe out the task's state stores.
If a task has multiple state stores, we also wipe out state that is actually clean and thus need to redo work for no reason. Instead of wiping out all state store, we should only wipe out the single state store that corresponds to the changelog topic partition that hit the `OffsetOutOfRangeException`, but preserve the restore progress for all other state stores.
We need to consider persistent and in-memory stores: for persistent stores, it would be fine to close the not affected stores cleanly and also write the checkpoint file. For in-memory stores however, we should not close the store to avoid dropping the in-memory data.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)