You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "shen (Jira)" <ji...@apache.org> on 2022/11/11 11:59:00 UTC

[jira] [Created] (KAFKA-14383) CorruptRecordException when reading data from log segment will not cause log offline

shen created KAFKA-14383:
----------------------------

             Summary: CorruptRecordException when reading data from log segment will not cause log offline
                 Key: KAFKA-14383
                 URL: https://issues.apache.org/jira/browse/KAFKA-14383
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 2.8.1
            Reporter: shen


In our production environment, disk break down cause data corruption. When consumer and follower read from partition leader, CorruptRecordException is thrown:
{code:java}
Caused by: org.apache.kafka.common.errors.CorruptRecordException: Record size 0 is less than the minimum record overhead
{code}
Call stack is muck like:
{code:java}
Breakpoint reached
    at org.apache.kafka.common.record.FileLogInputStream.nextBatch(FileLogInputStream.java:62)
    at org.apache.kafka.common.record.FileLogInputStream.nextBatch(FileLogInputStream.java:40)
    at org.apache.kafka.common.record.RecordBatchIterator.makeNext(RecordBatchIterator.java:35)
    at org.apache.kafka.common.record.RecordBatchIterator.makeNext(RecordBatchIterator.java:24)
    at org.apache.kafka.common.utils.AbstractIterator.maybeComputeNext(AbstractIterator.java:79)
    at org.apache.kafka.common.utils.AbstractIterator.hasNext(AbstractIterator.java:45)
    at org.apache.kafka.common.record.FileRecords.searchForOffsetWithSize(FileRecords.java:286)
    at kafka.log.LogSegment.translateOffset(LogSegment.scala:254)
    at kafka.log.LogSegment.read(LogSegment.scala:277)
    at kafka.log.Log$$anonfun$read$2.apply(Log.scala:1161)
    at kafka.log.Log$$anonfun$read$2.apply(Log.scala:1116)
    at kafka.log.Log.maybeHandleIOException(Log.scala:1839) <--------------- only cope with IOException
    at kafka.log.Log.read(Log.scala:1116)
    at kafka.server.ReplicaManager.kafka$server$ReplicaManager$$read$1(ReplicaManager.scala:926)
    at kafka.server.ReplicaManager$$anonfun$readFromLocalLog$1.apply(ReplicaManager.scala:989)
    at kafka.server.ReplicaManager$$anonfun$readFromLocalLog$1.apply(ReplicaManager.scala:988)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at kafka.server.ReplicaManager.readFromLocalLog(ReplicaManager.scala:988)
    at kafka.server.ReplicaManager.readFromLog$1(ReplicaManager.scala:815)
    at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:828)
    at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:680)
    at kafka.server.KafkaApis.handle(KafkaApis.scala:107)
    at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:74)
    at java.lang.Thread.run(Thread.java:748)
{code}
 

CorruptRecordException extends RetriableException. When broker reads from local log segment, data corruption usually cannot fixed by retry.

I think local file currption should cause log offline, but currently only IOException has chance to cause log offline in Log#maybeHandleIOException.

So even if I have 3 replica, consumer will never continue consume once data curruption happen in leader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)