You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Sergey Ivanov (Jira)" <ji...@apache.org> on 2023/05/04 05:31:00 UTC
[jira] [Commented] (KAFKA-14322) Kafka node eating Disk continuously
[ https://issues.apache.org/jira/browse/KAFKA-14322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719157#comment-17719157 ]
Sergey Ivanov commented on KAFKA-14322:
---------------------------------------
Hi,
We faced similar problem.
I described it in ticket KAFKA-14817, these may be related issues.
> Kafka node eating Disk continuously
> ------------------------------------
>
> Key: KAFKA-14322
> URL: https://issues.apache.org/jira/browse/KAFKA-14322
> Project: Kafka
> Issue Type: Bug
> Components: log, log cleaner
> Affects Versions: 2.8.1
> Reporter: Abhijit Patil
> Priority: Major
> Attachments: image-2022-10-19-15-51-52-735.png, image-2022-10-19-15-53-39-928.png
>
>
> We have 2.8.1 Kafka cluster in our Production environment. It has it continuously growing disk consumption and eating all disk space allocated and crash node with no disk space left
> !image-2022-10-19-15-51-52-735.png|width=344,height=194!
>
> !image-2022-10-19-15-53-39-928.png|width=470,height=146!
> [Log partition=__consumer_offsets-41, dir=/var/lib/kafka/data/kafka-log0] Rolled new log segment at offset 10537467423 in 4 ms. (kafka.log.Log) [data-plane-kafka-request-handler-4]"
> I can see that for node 0 for partition __consumer_offsets-41 its rolling new segment however its never got cleanup.
> This is the root cause for disk uses increase.
> Due to some condition/bug/trigger, something internally has gone wrong with the consumer offset coordinator thread and it has gone berserk!
>
> Take a look at the consumer-offset logs below it's generating. If you take a closer look it's the same data it's writing in a loop forever. The product topic in question doesn't have any traffic. This is generating an insane amount of consumer-offset logs which currently amounts to *571GB* and this is endless no matter how much terabytes we add it will eat it eventually{*}.{*}
> One more thing the consumer offset logs it's generating also marking everything as invalid that you can in the second log dump below.
>
> {-}kafka-0 data]$ du -sh kafka-log0/__consumer_offsets{-}*
> 12K kafka-log0/__consumer_offsets-11
> 12K kafka-log0/__consumer_offsets-14
> 12K kafka-log0/__consumer_offsets-17
> 12K kafka-log0/__consumer_offsets-2
> 12K kafka-log0/__consumer_offsets-20
> 12K kafka-log0/__consumer_offsets-23
> 12K kafka-log0/__consumer_offsets-26
> 12K kafka-log0/__consumer_offsets-29
> 12K kafka-log0/__consumer_offsets-32
> 12K kafka-log0/__consumer_offsets-35
> 12K kafka-log0/__consumer_offsets-38
> *588G* kafka-log0/__consumer_offsets-41
> 48K kafka-log0/__consumer_offsets-44
> 12K kafka-log0/__consumer_offsets-47
> 12K kafka-log0/__consumer_offsets-5
> 12K kafka-log0/__consumer_offsets-8
>
> [response-consumer,feature.response.topic,2]::OffsetAndMetadata(offset=107, leaderEpoch=Optional[23], metadata=, commitTimestamp=1664883985122, expireTimestamp=None) *[response-consumer,feature.response.topic,15]::OffsetAndMetadata(offset=112, leaderEpoch=Optional[25], metadata=, commitTimestamp=1664883985129, expireTimestamp=None)*
>
>
> {*}[response-consumer,feature.response.topic,15]::OffsetAndMetadata(offset=112, leaderEpoch=Optional[25], metadata=, commitTimestamp=1664883985139, expireTimestamp=None) [response-\{*}consumer,.feature.response.topic,13]::OffsetAndMetadata(offset=112, leaderEpoch=Optional[24], metadata=, commitTimestamp=1664883985139, expireTimestamp=None)
>
>
> baseOffset: 5616487061 lastOffset: 5616487061 count: 1 baseSequence: 0 lastSequence: 0 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 6 isTransactional: false
> isControl: false position: 3423 CreateTime: 1660892213452 size: 175 magic: 2 compresscodec: NONE crc: 1402370404 *isvalid: true*
> baseOffset: 5616487062 lastOffset: 5616487062 count: 1 baseSequence: 0 lastSequence: 0 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 6 isTransactional: false
> isControl: false position: 3598 CreateTime: 1660892213462 size: 175 magic: 2 compresscodec: NONE crc: 1105941790 *isvalid: true*
> |offset: 5616487062 CreateTime: 1660892213462 keysize: 81 valuesize: 24 sequence: 0 headerKeys: [] key:|
> For our topics we have below retention configuration
> retention.ms: 86400000
> segment.bytes: 1073741824
>
> For consumer offset internal topic its default cleanup policy and retention.
>
> We suspect this is similar to https://issues.apache.org/jira/browse/KAFKA-9543
> This appear for 1 environment only, same cluster with same configuration works corrctly on other environments.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)