You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Sergey Ivanov (Jira)" <ji...@apache.org> on 2023/05/04 05:31:00 UTC
[jira] [Commented] (KAFKA-14322) Kafka node eating Disk continuously

    [ https://issues.apache.org/jira/browse/KAFKA-14322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719157#comment-17719157 ] 

Sergey Ivanov commented on KAFKA-14322:
---------------------------------------

Hi,

We faced similar problem.

I described it in ticket KAFKA-14817, these may be related issues.

> Kafka node eating Disk continuously 
> ------------------------------------
>
>                 Key: KAFKA-14322
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14322
>             Project: Kafka
>          Issue Type: Bug
>          Components: log, log cleaner
>    Affects Versions: 2.8.1
>            Reporter: Abhijit Patil
>            Priority: Major
>         Attachments: image-2022-10-19-15-51-52-735.png, image-2022-10-19-15-53-39-928.png
>
>
> We have 2.8.1 Kafka cluster in our Production environment. It has it continuously growing disk consumption and eating all disk space allocated and crash node with no disk space left
> !image-2022-10-19-15-51-52-735.png|width=344,height=194!
>  
> !image-2022-10-19-15-53-39-928.png|width=470,height=146!
> [Log partition=__consumer_offsets-41, dir=/var/lib/kafka/data/kafka-log0] Rolled new log segment at offset 10537467423 in 4 ms. (kafka.log.Log) [data-plane-kafka-request-handler-4]"
> I can see that for node 0 for partition __consumer_offsets-41 its rolling new segment however its never got cleanup.
> This is the root cause for disk uses increase.  
> Due to some condition/bug/trigger, something internally has gone wrong with the consumer offset coordinator thread and it has gone berserk!  
>  
> Take a look at the consumer-offset logs below it's generating. If you take a closer look it's the same data it's writing in a loop forever. The product topic in question doesn't have any traffic. This is generating an insane amount of consumer-offset logs which currently amounts to *571GB* and this is endless no matter how much terabytes we add it will eat it eventually{*}.{*}  
>  One more thing the consumer offset logs it's generating also marking everything as invalid that you can in the second log dump below.
>  
> {-}kafka-0 data]$ du -sh kafka-log0/__consumer_offsets{-}*
> 12K kafka-log0/__consumer_offsets-11
> 12K kafka-log0/__consumer_offsets-14
> 12K kafka-log0/__consumer_offsets-17
> 12K kafka-log0/__consumer_offsets-2
> 12K kafka-log0/__consumer_offsets-20
> 12K kafka-log0/__consumer_offsets-23
> 12K kafka-log0/__consumer_offsets-26
> 12K kafka-log0/__consumer_offsets-29
> 12K kafka-log0/__consumer_offsets-32
> 12K kafka-log0/__consumer_offsets-35
> 12K kafka-log0/__consumer_offsets-38
> *588G* kafka-log0/__consumer_offsets-41
> 48K kafka-log0/__consumer_offsets-44
> 12K kafka-log0/__consumer_offsets-47
> 12K kafka-log0/__consumer_offsets-5
> 12K kafka-log0/__consumer_offsets-8
>  
> [response-consumer,feature.response.topic,2]::OffsetAndMetadata(offset=107, leaderEpoch=Optional[23], metadata=, commitTimestamp=1664883985122, expireTimestamp=None) *[response-consumer,feature.response.topic,15]::OffsetAndMetadata(offset=112, leaderEpoch=Optional[25], metadata=, commitTimestamp=1664883985129, expireTimestamp=None)*
>  
>  
> {*}[response-consumer,feature.response.topic,15]::OffsetAndMetadata(offset=112, leaderEpoch=Optional[25], metadata=, commitTimestamp=1664883985139, expireTimestamp=None) [response-\{*}consumer,.feature.response.topic,13]::OffsetAndMetadata(offset=112, leaderEpoch=Optional[24], metadata=, commitTimestamp=1664883985139, expireTimestamp=None)
>  
>  
> baseOffset: 5616487061 lastOffset: 5616487061 count: 1 baseSequence: 0 lastSequence: 0 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 6 isTransactional: false
> isControl: false position: 3423 CreateTime: 1660892213452 size: 175 magic: 2 compresscodec: NONE crc: 1402370404 *isvalid: true*
> baseOffset: 5616487062 lastOffset: 5616487062 count: 1 baseSequence: 0 lastSequence: 0 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 6 isTransactional: false
> isControl: false position: 3598 CreateTime: 1660892213462 size: 175 magic: 2 compresscodec: NONE crc: 1105941790 *isvalid: true*
> |offset: 5616487062 CreateTime: 1660892213462 keysize: 81 valuesize: 24 sequence: 0 headerKeys: [] key:|
> For our topics we have below retention configuration 
> retention.ms: 86400000
> segment.bytes: 1073741824
>  
> For consumer offset internal topic its default cleanup policy and retention.
>  
> We suspect this is similar to https://issues.apache.org/jira/browse/KAFKA-9543 
> This appear for 1 environment only, same cluster with same configuration works corrctly on other environments. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)