You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by "Jun Rao (JIRA)" <ji...@apache.org> on 2015/07/17 03:10:04 UTC

[jira] [Updated] (KAFKA-2178) Loss of highwatermarks on incorrect cluster shutdown/restart

     [ https://issues.apache.org/jira/browse/KAFKA-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao updated KAFKA-2178:
---------------------------
    Status: In Progress  (was: Patch Available)

[~aozeritsky], thanks for the patch. The current logic in ReplicaManager is that we expect the very first LeaderAndIsrRequest after a broker starts contains all the partitions hosted on this broker. So, in becomeLeaderOrFollower(), we only start the highWaterMark checkpoint thread after processing the first LeaderAndIsrRequest. At that point, the partition list should be complete.

Perhaps we should figure out how you get into that state.

> Loss of highwatermarks on incorrect cluster shutdown/restart
> ------------------------------------------------------------
>
>                 Key: KAFKA-2178
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2178
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.8.2.1
>            Reporter: Alexey Ozeritskiy
>         Attachments: KAFKA-2178.patch
>
>
> ReplicaManager flushes highwatermarks only for partitions which it recieved from Controller.
> If Controller sends incomplete list of partitions then ReplicaManager will write incomplete list of highwatermarks.
> As a result one can lose a lot of data during incorrect broker restart.
> We got this situation in real life on our cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)