You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/02/21 09:27:00 UTC
[jira] [Commented] (KAFKA-7968) Delete leader epoch cache files with old message format versions

    [ https://issues.apache.org/jira/browse/KAFKA-7968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773868#comment-16773868 ] 

ASF GitHub Bot commented on KAFKA-7968:
---------------------------------------

stanislavkozlovski commented on pull request #6298: KAFKA-7968: Delete leader epoch cache files with old message format versions
URL: https://github.com/apache/kafka/pull/6298
 
 
   KAFKA-7897 fixed a critical bug where replica followers would inadequately use the leader epoch cache for truncating their logs upon becoming a follower. The root of the issue was that a regression in KAFKA-7415 caused the leader epoch cache to be populated upon becoming a leader, even if the message format was older and did not support epoch caches. This resulted in very sparsely populated caches which the brokers would make use of when becoming a follower, resulting in huge log truncations.
   
   KAFKA-7897 fixed that problem by not updating the leader epoch cache if the message format does not support it. It was merged all the way back to 1.1 but due to significant branch divergence the patches for 2.0 and below were simplified. As said in the commit:
   > Note this is a simplified fix than what was merged to trunk in #6232 since the branches have diverged significantly. Rather than removing the epoch cache file, we guard usage of the cache with the record version.
   Due to the previous problem, we still had the sparsely populated epoch cache file present. This results in the same bug being hit at a different time. When the message format gets upgraded to support the leader epoch cache, brokers start to make use of it. This results in the same large truncations we saw in KAFKA-7897.
   
   The key difference is that the patches for 2.1 and trunk deleted the non-empty leader epoch cache files if the log message format did not support it.
   We should update the earlier versions to do the same thing. That way, users that have upgraded to 2.0.1 but are still using the old message formats/protocol will have their epochs cleaned up on the first roll that upgrades the `inter.broker.protocol.version`
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Delete leader epoch cache files with old message format versions
> ----------------------------------------------------------------
>
>                 Key: KAFKA-7968
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7968
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.0.1
>            Reporter: Stanislav Kozlovski
>            Assignee: Stanislav Kozlovski
>            Priority: Major
>
> [KAFKA-7897 (Invalid use of epoch cache with old message format versions)|https://issues.apache.org/jira/browse/KAFKA-7897] fixed a critical bug where replica followers would inadequately use their leader epoch cache for truncating their logs upon becoming a follower. [The root of the issue|https://issues.apache.org/jira/browse/KAFKA-7897?focusedCommentId=16761049&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16761049] was that a regression in KAFKA-7415 caused the leader epoch cache to be populated upon becoming a follower, even if the message format was older.
> KAFKA-7897 fixed that problem by not updating the leader epoch cache if the message format does not support it. It was merged all the way back to 1.1 but due to significant branch divergence, the patches for 2.0 and below were simplified. As said in the commit:
> Note this is a simplified fix than what was merged to trunk in #6232 since the branches have diverged significantly. Rather than removing the epoch cache file, we guard usage of the cache with the record version.
> This results in the same bug being hit at a different time. When the message format gets upgraded to support the leader epoch cache, brokers start to make use of it. Due to the previous problem, we still have the sparsely populated epoch cache file present. This results in the same large truncations we saw in KAFKA-7897.
> The key difference is that the patches for 2.1 and trunk *deleted* the non-empty leader epoch cache files if the log message format did not support it.
> We should update the earlier versions to do the same thing. That way, users that have upgraded to 2.0.1 but are still using old message formats/protocol will have their epochs cleaned up on the first roll that upgrades the `inter.broker.protocol.version`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)