You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by "Stanislav Kozlovski (JIRA)" <ji...@apache.org> on 2019/02/22 18:28:00 UTC

[jira] [Created] (KAFKA-7984) Do not rebuild leader epochs on segments that do not support it

Stanislav Kozlovski created KAFKA-7984:
------------------------------------------

             Summary: Do not rebuild leader epochs on segments that do not support it
                 Key: KAFKA-7984
                 URL: https://issues.apache.org/jira/browse/KAFKA-7984
             Project: Kafka
          Issue Type: Bug
            Reporter: Stanislav Kozlovski
            Assignee: Stanislav Kozlovski


h3. Preface

https://issues.apache.org/jira/browse/KAFKA-7897 (logs would store some leader epochs even if they did not support them - this is essentially a regression from https://issues.apache.org/jira/browse/KAFKA-7415)
https://issues.apache.org/jira/browse/KAFKA-7959

If users are running Kafka with https://issues.apache.org/jira/browse/KAFKA-7415 merged in, chances are they have sparsely-populated leader epoch cache files.
KAFKA-7897's implementation unintentionally handled the case of deletes those leader epoch cache files for versions 2.1+. For versions below, KAFKA-7959 fixes that.

In any case, as it currently stands, a broker started up with a message format of `0.10.0` will have those leader epoch cache files deleted.


h3. Problem

We have logic [that rebuilds these leader epoch cache files|https://github.com/apache/kafka/blob/217f45ed554b34d5221e1dd3db76e4be892661cf/core/src/main/scala/kafka/log/Log.scala#L614] when recovering segments that do not have a clean shutdown file. It goes over the record batches and rebuilds the leader epoch.
KAFKA-7959's implementation guards against this by checking that the log.message.format supports it, *but* that issue is only merged for versions *below 2.1*.

Moreover, the case where `log.message.format >= 0.11` *is not handled*. If a broker has the following log segment file:
{code:java}
offset 0, format v2, epoch 1
offset 1, format v2, epoch 1
offset 2, format v1, no epoch
offset 3, format v1, no epoch
{code}
and gets upgraded to a new log message format that supports it, the rebuild of any logs that had an unclean shutdown will populate the leader epoch cache again, potentially resulting in the issue described in KAFKA-7897

One potential simple way to solve this is to clear the accumulated leader epoch cache when encountering a batch with no epoch upon segment rebuilding.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)