You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "David Jacot (Jira)" <ji...@apache.org> on 2023/02/14 09:04:00 UTC

[jira] [Resolved] (KAFKA-14704) Follower should truncate before incrementing high watermark

     [ https://issues.apache.org/jira/browse/KAFKA-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Jacot resolved KAFKA-14704.
---------------------------------
    Fix Version/s: 3.5.0
                   3.4.1
                   3.3.3
         Reviewer: Jason Gustafson
       Resolution: Fixed

> Follower should truncate before incrementing high watermark
> -----------------------------------------------------------
>
>                 Key: KAFKA-14704
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14704
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: David Jacot
>            Assignee: David Jacot
>            Priority: Major
>             Fix For: 3.5.0, 3.4.1, 3.3.3
>
>
> When a leader becomes a follower, it is likely that it has uncommitted records in its log. When it reaches out to the leader, the leader will detect that they have diverged and it will return the diverging epoch and offset. The follower truncates it log based on this.
> There is a small caveat in this process. When the leader return the diverging epoch and offset, it also includes its high watermark, low watermark, start offset and end offset. The current code in the `AbstractFetcherThread` works as follow. First it process the partition data and then it checks whether there is a diverging epoch/offset. The former may accidentally expose uncommitted records as this step updates the local watermark to whatever is received from the leader. As the follower, or the former leader, may have uncommitted records, it will be able to updated the high watermark to a larger offset if the leader has a higher watermark than the current local one. This result in exposing uncommitted records until the log is finally truncated. The time window is short but a fetch requests coming at the right time to the follower could read those records. This is especially true for clients out there which uses recent versions of the fetch request but without implementing KIP-320.
> When this happens, the follower logs the following message: `Non-monotonic update of high watermark from (offset=21437 segment=[20998:98390]) to (offset=21434 segment=[20998:97843])`.
> This patch proposes to mitigate the issue by starting by checking on whether a diverging epoch/offset is provided by the leader and skip processing the partition data if it is. This basically means that the first fetch request will result in truncating the log and a subsequent fetch request will update the log/high watermarks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)