You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by "Neha Narkhede (JIRA)" <ji...@apache.org> on 2013/04/05 01:52:15 UTC

[jira] [Updated] (KAFKA-851) Broken handling of leader and isr request leads to incorrect high watermark checkpoint file

     [ https://issues.apache.org/jira/browse/KAFKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Neha Narkhede updated KAFKA-851:
--------------------------------

    Attachment: kafka-851-v1.patch

Fixed Partition.makeFollower() to do the following irrespective of whether the leader is alive or not -

getOrCreateReplica: This creates a local replica, if one does not exist. This ensures that, on the very first leader and isr request, the broker will create a replica object for every partition in the leader and isr request. As part of this, it reads the previous high watermark value for that partition and creates a replica object with that high watermark value. This ensures that the right high watermark value gets checkpointed by the checkpoint thread. This will also ensure that all partitions will get checkpointed to the file

                
> Broken handling of leader and isr request leads to incorrect high watermark checkpoint file
> -------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-851
>                 URL: https://issues.apache.org/jira/browse/KAFKA-851
>             Project: Kafka
>          Issue Type: Bug
>          Components: replication
>    Affects Versions: 0.8
>            Reporter: Neha Narkhede
>            Assignee: Neha Narkhede
>            Priority: Blocker
>              Labels: kafka-0.8, p1
>         Attachments: kafka-851-v1.patch
>
>
> The broker depends on receiving a list of *all* partitions from the controller on startup. It uses this information to create a list of partitions that will get check pointed to the high watermark checkpoint file. However, during a make follower operation, it adds a partition to the high watermark checkpoint list only if its leader is alive. Due to this, even if the controller sends a full list of partitions to the broker, replica manager filters it to keep only those partitions whose leader is alive. This leads to the high watermark value for the rest of those partitions to reset to 0. Which, in turn, leads to the follower to fetch from the beginning of leader's log instead of min(log end offset, high watermark). The effect of this is very long lag on the replica fetchers leading to high under replicated partition count

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira