You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Gardner Vickers (Jira)" <ji...@apache.org> on 2021/06/17 13:10:00 UTC
[jira] [Created] (KAFKA-12964) Corrupt segment recovery can delete
new producer state snapshots
Gardner Vickers created KAFKA-12964:
---------------------------------------
Summary: Corrupt segment recovery can delete new producer state snapshots
Key: KAFKA-12964
URL: https://issues.apache.org/jira/browse/KAFKA-12964
Project: Kafka
Issue Type: Bug
Components: core
Affects Versions: 2.8.0
Reporter: Gardner Vickers
Assignee: Gardner Vickers
During log recovery, we may schedule asynchronous deletion in deleteSegmentFiles.
[https://github.com/apache/kafka/blob/fc5245d8c37a6c9d585c5792940a8f9501bedbe1/core/src/main/scala/kafka/log/Log.scala#L2382]
If we're truncating the log, this may result in deletions for segments with matching base offsets to segments which will be written in the future. To avoid asynchronously deleting future segments, we rename the segment and index files, but we do not do this for producer state snapshot files.
This leaves us vulnerable to a race condition where we could end up deleting snapshot files for segments written after log recovery when async deletion runs.
To fix this, we should first remove the `SnapshotFile` from the `ProducerStateManager` and rename the file to have a `Log.DeletedFileSuffix`. Then we can asynchronously delete the snapshot file later.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)