You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Kamal Chandraprakash <ka...@gmail.com> on 2024/04/01 03:47:35 UTC

Re: Kafka followers with higher leader epoch than leader

Hi,

The follower is not able to sync-up with the leader due to epochs diverged
between leader and follower.
To confirm this, you can enable request logger and check the
diverging-epoch field in the fetch-response:

https://sourcegraph.com/github.com/apache/kafka@a640a81040f6ef6f85819b60194f0394f5f2194e/-/blob/clients/src/main/resources/common/message/FetchResponse.json?L76

This issue can happen when the leader-epoch-checkpoint file is corrupted in
the leader node. To mitigate the issue, you have to:

1. Stop the leader broker
2. Remove the `leader-epoch-checkpoint` file for that affected partition
3. Recover the partition by deleting the partition entry from the
checkpoint files: `log-start-offset-checkpoint`,
`replication-offset-checkpoint`, `recovery-point-offset-checkpoint`, and
`cleaner-offset-checkpoint`. Note that when removing the entry, you also
have to update the number of entries in those files in Line 2.
4. Remove the `.kafka_cleanshutdown` marker file.
5. Start the node and trigger preferred leader election to elect back the
same node as leader
6. Then, the follower will be able to sync up with the leader.

--
Kamal

On Tue, Mar 19, 2024 at 6:06 PM Karl Sorensen <ka...@digitalis.io>
wrote:

> Hi,
>
> I have an unusual situation where I have a cluster running Kafka 3.5.1 in
> strimzi where 4 of the __consumer_offset partitions have dropped under min
> isr.
>
> Everything else appears to be working fine.
> Upon investigating, i've found that the partition followers appear to be
> out of sync with the leader in terms of leader epoch
>
> For example the leader-epoch-checkpoint file on the leader partition is
> 0
> 4
> 0 0
> 1 4
> 4 6
> 27 10
>
> while the followers are
> 0
> 5
> 0 0
> 1 4
> 4 6
> 5 7
> 6 9
>
> which appears to me like the followers are 2 elections ahead of the leader
> and i'm not sure how they got to this situation.
> I've attempted to force a new leader election via kafka-leader-elections
> but it refused for both PREFERRED and UNCLEAN.
> I've also tried a manual partition assignment to move the leader to another
> broker but it wont do it.
>
> What is even more strange is that if i watch the leader-epoch-checkpoint
> file on one of the followers I can see it constantly changing as it tries
> to sort itself out.
> [kafka@internal-001-kafka-0 __consumer_offsets-18]$ cat
> leader-epoch-checkpoint
> 0
> 3
> 0 0
> 1 4
> 4 6
> [kafka@internal-001-kafka-0 __consumer_offsets-18]$ cat
> leader-epoch-checkpoint
> 0
> 5
> 0 0
> 1 4
> 4 6
> 5 7
> 6 9
>
> I have tried to manually remove the followers partition files on disk in an
> attempt to get it to sync from the leader but it keeps returning to the
> inconsistent state.
>
> Restarting the broker with the partition leader on it doesn't seem to move
> leadership either.
>
> The follower keeps logging the following constantly
> 2024-03-19 09:23:11,169 INFO [ReplicaFetcher replicaId=2, leaderId=1,
> fetcherId=0] Truncating partition __consumer_offsets-18 with
> TruncationState(offset=7, completed=true) due to leader epoch and offset
> EpochEndOffset(errorCode=0, partition=18, leaderEpoch=4, endOffset=10)
> (kafka.server.ReplicaFetcherThread) [ReplicaFetcherThread-0-1]
> 2024-03-19 09:23:11,169 INFO [UnifiedLog partition=__consumer_offsets-18,
> dir=/var/lib/kafka/data-0/kafka-log2] Truncating to offset 7
> (kafka.log.UnifiedLog) [ReplicaFetcherThread-0-1]
> 2024-03-19 09:23:11,174 INFO [UnifiedLog partition=__consumer_offsets-18,
> dir=/var/lib/kafka/data-0/kafka-log2] Loading producer state till offset 7
> with message format version 2 (kafka.log.UnifiedLog$)
> [ReplicaFetcherThread-0-1]
> 2024-03-19 09:23:11,174 INFO [UnifiedLog partition=__consumer_offsets-18,
> dir=/var/lib/kafka/data-0/kafka-log2] Reloading from producer snapshot and
> rebuilding producer state from offset 7 (kafka.log.UnifiedLog$)
> [ReplicaFetcherThread-0-1]
> 2024-03-19 09:23:11,174 INFO [ProducerStateManager
> partition=__consumer_offsets-18]Loading producer state from snapshot file
> 'SnapshotFile(offset=7,
>
> file=/var/lib/kafka/data-0/kafka-log2/__consumer_offsets-18/00000000000000000007.snapshot)'
> (org.apache.kafka.storage.internals.log.ProducerStateManager)
> [ReplicaFetcherThread-0-1]
> 2024-03-19 09:23:11,175 INFO [UnifiedLog partition=__consumer_offsets-18,
> dir=/var/lib/kafka/data-0/kafka-log2] Producer state recovery took 1ms for
> snapshot load and 0ms for segment recovery from offset 7
> (kafka.log.UnifiedLog$) [ReplicaFetcherThread-0-1]
> 2024-03-19 09:23:11,175 WARN [UnifiedLog partition=__consumer_offsets-18,
> dir=/var/lib/kafka/data-0/kafka-log2] Non-monotonic update of high
> watermark from (offset=10segment=[0:4083]) to (offset=7segment=[0:3607])
> (kafka.log.UnifiedLog) [ReplicaFetcherThread-0-1]
>
> Any ideas of how to look at this further?
> Thanks
> Karl
>
> --
>
>
>
> --
>
> The information contained in this electronic message and any
> attachments to this message are intended for the exclusive use of the
> addressee(s) and may contain proprietary, confidential or privileged
> information. If you are not the intended recipient, you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and destroy all copies of this message and any attachments.
> WARNING: Computer viruses can be transmitted via email. The recipient
> should check this email and any attachments for the presence of viruses.
> The company accepts no liability for any damage caused by any virus
> transmitted by this email.
>