You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@kafka.apache.org by "A. Sophie Blee-Goldman (Jira)" <ji...@apache.org> on 2021/02/01 23:09:00 UTC

[jira] [Updated] (KAFKA-10391) Streams should overwrite checkpoint excluding corrupted partitions

     [ https://issues.apache.org/jira/browse/KAFKA-10391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

A. Sophie Blee-Goldman updated KAFKA-10391:
-------------------------------------------
    Affects Version/s: 2.7.0

> Streams should overwrite checkpoint excluding corrupted partitions
> ------------------------------------------------------------------
>
>                 Key: KAFKA-10391
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10391
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.7.0
>            Reporter: Guozhang Wang
>            Assignee: Guozhang Wang
>            Priority: Major
>             Fix For: 2.7.0
>
>
> While working on https://issues.apache.org/jira/browse/KAFKA-9450 I discovered another bug in Streams: when some partitions are corrupted due to offsets out of range, we treat it as task corrupted and would close them as dirty and then revive. However we forget to overwrite the checkpoint file excluding those out-of-range partitions to let them be re-bootstrapped from the new log-start offset, and hence when the task is revived, it would still load the old offset and start from there and then get the out-of-range exception again. This may cause {{StreamsUpgradeTest.test_app_upgrade}} to be flaky.
> We do not see this often because in the past we always delete the checkpoint file after loading it and we usually only see the out-of-range exception at the beginning of the restoration but not during restoration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)