You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Stanislav Kozlovski (JIRA)" <ji...@apache.org> on 2019/02/21 08:13:00 UTC

[jira] [Created] (KAFKA-7968) Delete leader epoch cache files with old message format versions

Stanislav Kozlovski created KAFKA-7968:
------------------------------------------

             Summary: Delete leader epoch cache files with old message format versions
                 Key: KAFKA-7968
                 URL: https://issues.apache.org/jira/browse/KAFKA-7968
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 2.0.1
            Reporter: Stanislav Kozlovski
            Assignee: Stanislav Kozlovski


[KAFKA-7897 (Invalid use of epoch cache with old message format versions)|https://issues.apache.org/jira/browse/KAFKA-7897] fixed a critical bug where replica followers would inadequately use their leader epoch cache for truncating their logs upon becoming a follower. [The root of the issue|https://issues.apache.org/jira/browse/KAFKA-7897?focusedCommentId=16761049&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16761049] was that a regression in KAFKA-7415 caused the leader epoch cache to be populated upon becoming a follower, even if the message format was older.

KAFKA-7897 fixed that problem by not updating the leader epoch cache if the message format does not support it. It was merged all the way back to 1.1 but due to significant branch divergence, the patches for 2.0 and below were simplified. As said in the commit:
Note this is a simplified fix than what was merged to trunk in #6232 since the branches have diverged significantly. Rather than removing the epoch cache file, we guard usage of the cache with the record version.
This results in the same bug being hit at a different time. When the message format gets upgraded to support the leader epoch cache, brokers start to make use of it. Due to the previous problem, we still have the sparsely populated epoch cache file present. This results in the same large truncations we saw in KAFKA-7897.

The key difference is that the patches for 2.1 and trunk *deleted* the non-empty leader epoch cache files if the log message format did not support it.
We should update the earlier versions to do the same thing. That way, users that have upgraded to 2.0.1 but are still using old message formats/protocol will have their epochs cleaned up on the first roll that upgrades the `inter.broker.protocol.version`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)