You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jun Rao (JIRA)" <ji...@apache.org> on 2015/06/19 03:52:00 UTC

[jira] [Updated] (KAFKA-1758) corrupt recovery file prevents startup

     [ https://issues.apache.org/jira/browse/KAFKA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao updated KAFKA-1758:
---------------------------
       Resolution: Fixed
    Fix Version/s:     (was: 0.9.0)
                   0.8.3
           Status: Resolved  (was: Patch Available)

Thanks for the patch. +1 and committed to trunk.

> corrupt recovery file prevents startup
> --------------------------------------
>
>                 Key: KAFKA-1758
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1758
>             Project: Kafka
>          Issue Type: Bug
>          Components: log
>            Reporter: Jason Rosenberg
>            Assignee: Manikumar Reddy
>              Labels: newbie
>             Fix For: 0.8.3
>
>         Attachments: KAFKA-1758.patch, KAFKA-1758_2015-05-09_12:29:20.patch
>
>
> Hi,
> We recently had a kafka node go down suddenly. When it came back up, it apparently had a corrupt recovery file, and refused to startup:
> {code}
> 2014-11-06 08:17:19,299  WARN [main] server.KafkaServer - Error starting up KafkaServer
> java.lang.NumberFormatException: For input string: "^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
> ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@"
>         at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>         at java.lang.Integer.parseInt(Integer.java:481)
>         at java.lang.Integer.parseInt(Integer.java:527)
>         at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
>         at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
>         at kafka.server.OffsetCheckpoint.read(OffsetCheckpoint.scala:76)
>         at kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:106)
>         at kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:105)
>         at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>         at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
>         at kafka.log.LogManager.loadLogs(LogManager.scala:105)
>         at kafka.log.LogManager.<init>(LogManager.scala:57)
>         at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:275)
>         at kafka.server.KafkaServer.startup(KafkaServer.scala:72)
> {code}
> And the app is under a monitor (so it was repeatedly restarting and failing with this error for several minutes before we got to it)…
> We moved the ‘recovery-point-offset-checkpoint’ file out of the way, and it then restarted cleanly (but of course re-synced all it’s data from replicas, so we had no data loss).
> Anyway, I’m wondering if that’s the expected behavior? Or should it not declare it corrupt and then proceed automatically to an unclean restart?
> Should this NumberFormatException be handled a bit more gracefully?
> We saved the corrupt file if it’s worth inspecting (although I doubt it will be useful!)….
> The corrupt files appeared to be all zeroes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)