You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@kafka.apache.org by "Karsten Schnitter (JIRA)" <ji...@apache.org> on 2018/07/03 10:58:00 UTC

[jira] [Created] (KAFKA-7130) EOFException after rolling log segment

Karsten Schnitter created KAFKA-7130:
----------------------------------------

             Summary: EOFException after rolling log segment
                 Key: KAFKA-7130
                 URL: https://issues.apache.org/jira/browse/KAFKA-7130
             Project: Kafka
          Issue Type: Bug
          Components: replication
    Affects Versions: 1.1.0
            Reporter: Karsten Schnitter


When rolling a log segment one of our Kafka cluster got an immediate read error on the same partition. This lead to a flood of log messages containing the corresponding stacktraces. Data was still appended to the partition but consumers were unable to read from that partition. Reason for the exception is unclear.

{noformat}
[2018-07-02 23:53:32,732] INFO [Log partition=ingestion-3, dir=/var/vcap/store/kafka] Rolled new log segment at offset 971865991 in 1 ms. (kafka.log.Log)
[2018-07-02 23:53:32,739] INFO [ProducerStateManager partition=ingestion-3] Writing producer snapshot at offset 971865991 (kafka.log.ProducerStateManager)
[2018-07-02 23:53:32,739] INFO [Log partition=ingestion-3, dir=/var/vcap/store/kafka] Rolled new log segment at offset 971865991 in 1 ms. (kafka.log.Log)
[2018-07-02 23:53:32,750] ERROR [ReplicaManager broker=1] Error processing fetch operation on partition ingestion-3, offset 971865977 (kafka.server.ReplicaManager)

Caused by: java.io.EOFException: Failed to read `log header` from file channel `sun.nio.ch.FileChannelImpl@2e0e8810`. Expected to read 17 bytes, but reached end of file after reading 0 bytes. Started read from position 2147483643.
{noformat}

We mitigated the issue by stopping the affected node and deleting the corresponding directory. Once the partition was recreated for the replica (we use replication-factor 2) the other replica experienced the same problem. We mitigated likewise.

To us it is unclear, what caused this issue. Can you help us in finding the root cause of this problem?
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)