You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jay Kreps (JIRA)" <ji...@apache.org> on 2013/10/29 17:14:32 UTC

[jira] [Commented] (KAFKA-1106) HighwaterMarkCheckpoint failure puting broker into a bad state

    [ https://issues.apache.org/jira/browse/KAFKA-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808133#comment-13808133 ] 

Jay Kreps commented on KAFKA-1106:
----------------------------------

Do you have the highwatermark checkpoint file that caused this? Your patch makes things more tolerant of errors but I guess the question is how we got into that state...

> HighwaterMarkCheckpoint failure puting broker into a bad state
> --------------------------------------------------------------
>
>                 Key: KAFKA-1106
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1106
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: David Lao
>         Attachments: KAFKA-1106-patch, kafka.log
>
>
> I'm encountering a case where broker get stuck due to HighwaterMarkCheckpoint failing to recover from reading what appear to be corrupted isr entries. Once in this state, leader election can never succeed and hence stalling the entire cluster. 
> Please see the detailed stack trace from the attached log.  Perhaps failing fast when HighwaterMarkCheckpoint fails to read would force the broker to restart and recover.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)