You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Stanislav Kozlovski (JIRA)" <ji...@apache.org> on 2019/02/22 18:28:00 UTC
[jira] [Created] (KAFKA-7984) Do not rebuild leader epochs on
segments that do not support it
Stanislav Kozlovski created KAFKA-7984:
------------------------------------------
Summary: Do not rebuild leader epochs on segments that do not support it
Key: KAFKA-7984
URL: https://issues.apache.org/jira/browse/KAFKA-7984
Project: Kafka
Issue Type: Bug
Reporter: Stanislav Kozlovski
Assignee: Stanislav Kozlovski
h3. Preface
https://issues.apache.org/jira/browse/KAFKA-7897 (logs would store some leader epochs even if they did not support them - this is essentially a regression from https://issues.apache.org/jira/browse/KAFKA-7415)
https://issues.apache.org/jira/browse/KAFKA-7959
If users are running Kafka with https://issues.apache.org/jira/browse/KAFKA-7415 merged in, chances are they have sparsely-populated leader epoch cache files.
KAFKA-7897's implementation unintentionally handled the case of deletes those leader epoch cache files for versions 2.1+. For versions below, KAFKA-7959 fixes that.
In any case, as it currently stands, a broker started up with a message format of `0.10.0` will have those leader epoch cache files deleted.
h3. Problem
We have logic [that rebuilds these leader epoch cache files|https://github.com/apache/kafka/blob/217f45ed554b34d5221e1dd3db76e4be892661cf/core/src/main/scala/kafka/log/Log.scala#L614] when recovering segments that do not have a clean shutdown file. It goes over the record batches and rebuilds the leader epoch.
KAFKA-7959's implementation guards against this by checking that the log.message.format supports it, *but* that issue is only merged for versions *below 2.1*.
Moreover, the case where `log.message.format >= 0.11` *is not handled*. If a broker has the following log segment file:
{code:java}
offset 0, format v2, epoch 1
offset 1, format v2, epoch 1
offset 2, format v1, no epoch
offset 3, format v1, no epoch
{code}
and gets upgraded to a new log message format that supports it, the rebuild of any logs that had an unclean shutdown will populate the leader epoch cache again, potentially resulting in the issue described in KAFKA-7897
One potential simple way to solve this is to clear the accumulated leader epoch cache when encountering a batch with no epoch upon segment rebuilding.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)