You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Nikos Liv (JIRA)" <ji...@apache.org> on 2019/06/06 11:57:00 UTC

[jira] [Created] (KAFKA-8498) log-cleaner CorruptRecordException with __consumer_offsets

Nikos Liv created KAFKA-8498:
--------------------------------

             Summary: log-cleaner CorruptRecordException with __consumer_offsets
                 Key: KAFKA-8498
                 URL: https://issues.apache.org/jira/browse/KAFKA-8498
             Project: Kafka
          Issue Type: Bug
          Components: consumer, log
    Affects Versions: 1.0.1
            Reporter: Nikos Liv


Hello,

We have observed the following issue:

We had a java consumer with the same version as the reported kafka (1.0.1), this consumer was calling commit.sync() every couple of miliseconds even if there is no messages from poll, in fact this was called after the poll timeout, this consumer has some low message peaks but most of the time it doesn't receive any messages.

By changing the consumer behavior this problem doesn't appear. 

The kafka setup has 3 brokers with a replication factor 2, the disk that is used is a ceph block storage device that is exposed as an openstack cider volume. 

We noticed that at some point when the log-cleaner thread was trying to compact the __consumer_offset for this topic, it failed with:

CorruptRecordException: Record size is less than the minimum record overhead (14)

This was causing the log-cleaner to stop, filling up the available free disk space and causing kafka to stop working failing the whole system.

Is any known issues similar to this case?

Is it possible that this type of consumer behavior can cause such an issue?

It appears that the consumer will send data when we call commit sync, even if it didn't receive any messages, what is the behavior for this cases?

Is it possible for a consumer to send a message to kafka that is corrupted or for kafka to corrupt a message on disk or during replication?

Please provide some guidelines about any actions that are needed to troubleshoot.

Thanks in advance for your effort.

Br,

Nikos



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)