You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Alok Kumar Singh (Jira)" <ji...@apache.org> on 2022/01/05 08:09:00 UTC

[jira] [Commented] (KAFKA-13575) Unwanted cleanup since log.retention.hours=-1 does not work

    [ https://issues.apache.org/jira/browse/KAFKA-13575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469101#comment-17469101 ] 

Alok Kumar Singh commented on KAFKA-13575:
------------------------------------------

If someone picks this we have a running cluster already to reproduce and provide more information to fix this faster. 

 

We are thinking this config is not tested well as most people use compacted topics or use log.retention.ms=-1 instead of log.retention.hours=-1, so this config might not be tested well.

If you can give some pointers I will also try to look through code to debug this.

> Unwanted cleanup since log.retention.hours=-1 does not work
> -----------------------------------------------------------
>
>                 Key: KAFKA-13575
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13575
>             Project: Kafka
>          Issue Type: Bug
>          Components: log cleaner
>    Affects Versions: 2.6.0
>         Environment: AWS, Kubernetes cluster, Kafka run as strimzi pods.
>            Reporter: Alok Kumar Singh
>            Priority: Critical
>         Attachments: Screenshot 2022-01-02 at 4.22.50 PM.png, Screenshot 2022-01-02 at 4.30.38 PM.png
>
>
> Use case: We have a use case in which we never want topics data to get deleted. We used `log.retention.hours=-1` to keep the topic's retention as infinite.
> But still Kafka cleanup got triggered and it cleaned our topic's data moving our oldest offset from 0 to higher value. Below is the log from Kafka using default retention of 7days=6040800000ms and moving the offsets due to deleted segment. Logs are filled with messages "due to {*}retention time 604800000ms breach{*}" when it should have used the infinite retention based on the set config.
> *Logs* of the Kafka Pod that has the cleanup traces:
> {code:java}
> 2021-12-21 15:28:20,891 INFO [Log partition=ts.praccomm.patientsms-0, dir=/var/lib/kafka/data/kafka-log2] Found deletable segments with base offsets [0,2214146,4890534,7645992,10394781,13149705,15798698,18084408,20638290,23202704,25760900,28207227,30796134,33437632,35763142,38247977,40861013,43473617,46086956,48469054,50776433,53050704,55299660,57529943,59743683,61984715,64224317,66458910,68670877,70886848,73080554,74872031,76656553,78573653,80412933,82457759,84854900,89950485,94263626,98668311,103762764,107797683,111809208,113716544,115406491,117052453,119304884,121765245,123536361,125364537,127191439,129028759,130902041,132719344,134787234,139027470,143932296,147888638,150492879,152410799,154338596,156416190,160672948,165054157,169273069,171824978,173847615,175602822,177765267,179701522,181622192,183511328,185382862,187200966,188998652,193646877,198500489,203354616,208211378,210992397,212806790,214599592,216456221,218231836,220014080,221812337,223574882,225341294,227151134,228894227,231000663,232855550,234554788,236304441,238012477,239523793,241164599,242938200,244945345,246534386,248345745,249972929,251761034,253511669,255180052,256960087,258715796,260382401,262123766,263900146,265585895,267349067,269106101,270936370,272601164,274416555,276154378,277854987,279656300,281487544,283272481,285126550,286886989,288671958,290389171,292132610,293756088,295527031,297230008,298992929,300641108,302415673,304138766,305900851,307542789,309368805,311075952,312729042,314516564,316294429,318006961,319760277,321426916,323177673,324845934,326606764,328385019,330117277,331930761,333729477,335525038,337253962,339046341,340756170,342402991,344019861,345789681,347575987,349276417,350912980,352675272,354444307,356157225,357960082,359616551,361418905,363132123,364932894,366638881,368368450] due to retention time 604800000ms breach (kafka.log.Log) [kafka-scheduler-2]
> 2021-12-23 02:28:18,124 INFO [Log partition=ts.praccomm.patientsms-0, dir=/var/lib/kafka/data/kafka-log2] Found deletable segments with base offsets [370099688] due to retention time 604800000ms breach (kafka.log.Log) [kafka-scheduler-4]
> 2021-12-24 15:03:18,125 INFO [Log partition=ts.praccomm.patientsms-0, dir=/var/lib/kafka/data/kafka-log2] Found deletable segments with base offsets [371913024] due to retention time 604800000ms breach (kafka.log.Log) [kafka-scheduler-1]
> 2021-12-26 13:33:18,124 INFO [Log partition=ts.praccomm.patientsms-0, dir=/var/lib/kafka/data/kafka-log2] Found deletable segments with base offsets [373561655] due to retention time 604800000ms breach (kafka.log.Log) [kafka-scheduler-2]
> 2021-12-28 06:18:18,124 INFO [Log partition=ts.praccomm.patientsms-0, dir=/var/lib/kafka/data/kafka-log2] Found deletable segments with base offsets [375346988] due to retention time 604800000ms breach (kafka.log.Log) [kafka-scheduler-9]
> 2021-12-29 20:18:18,125 INFO [Log partition=ts.praccomm.patientsms-0, dir=/var/lib/kafka/data/kafka-log2] Found deletable segments with base offsets [377066152] due to retention time 604800000ms breach (kafka.log.Log) [kafka-scheduler-2]
> 2021-12-31 12:18:18,124 INFO [Log partition=ts.praccomm.patientsms-0, dir=/var/lib/kafka/data/kafka-log2] Found deletable segments with base offsets [378821599] due to retention time 604800000ms breach (kafka.log.Log) [kafka-scheduler-2] {code}
> *Listing out Configurations to prove we are setting the right values*
> *Broker Configurations:*
> {code:java}
> $ ./bin/kafka-configs.sh --command-config ./bin/client-ssl-auth.properties --bootstrap-server XXX:9094 --entity-type brokers --entity-name 2 --describe --all | grep retention
> (org.apache.kafka.clients.admin.AdminClientConfig)
>   log.cleaner.delete.retention.ms=86400000 sensitive=false synonyms={DEFAULT_CONFIG:log.cleaner.delete.retention.ms=86400000}
>   log.retention.hours=-1 sensitive=false synonyms={STATIC_BROKER_CONFIG:log.retention.hours=-1, DEFAULT_CONFIG:log.retention.hours=168}
>   log.retention.minutes=null sensitive=false synonyms={}
>   offsets.retention.check.interval.ms=600000 sensitive=false synonyms={DEFAULT_CONFIG:offsets.retention.check.interval.ms=600000}
>   log.retention.check.interval.ms=300000 sensitive=false synonyms={DEFAULT_CONFIG:log.retention.check.interval.ms=300000}
>   log.retention.ms=null sensitive=false synonyms={}
>   log.retention.bytes=-1 sensitive=false synonyms={DEFAULT_CONFIG:log.retention.bytes=-1}
>   offsets.retention.minutes=525600 sensitive=false synonyms={STATIC_BROKER_CONFIG:offsets.retention.minutes=525600, DEFAULT_CONFIG:offsets.retention.minutes=10080} {code}
> *I also checked other brokers also have same configuration:*
> {code:java}
> ./bin/kafka-configs.sh --command-config ./bin/client-ssl-auth.properties --bootstrap-server XXX --entity-type brokers --entity-name 0 --describe --all | grep log.retention 
>  log.retention.hours=-1 sensitive=false synonyms={STATIC_BROKER_CONFIG:log.retention.hours=-1, DEFAULT_CONFIG:log.retention.hours=168} {code}
> {code:java}
> ./bin/kafka-configs.sh --command-config ./bin/client-ssl-auth.properties --bootstrap-server XXX --entity-type brokers --entity-name 0 --describe --all | grep log.retention 
> log.retention.hours=-1 sensitive=false synonyms={STATIC_BROKER_CONFIG:log.retention.hours=-1, DEFAULT_CONFIG:log.retention.hours=168}  {code}
> *Topic Configuration:* 
> {code:java}
> $ ./bin/kafka-configs.sh --command-config ./bin/client-ssl-auth.properties --bootstrap-server XXXX:9094 --entity-type topics --entity-name ts.praccomm.patientsms --describe --all
> org.apache.kafka.clients.admin.AdminClientConfig)
> All configs for topic ts.praccomm.patientsms are:
>   compression.type=producer sensitive=false synonyms={DEFAULT_CONFIG:compression.type=producer}
>   leader.replication.throttled.replicas= sensitive=false synonyms={}
>   min.insync.replicas=1 sensitive=false synonyms={DEFAULT_CONFIG:min.insync.replicas=1}
>   message.downconversion.enable=true sensitive=false synonyms={DEFAULT_CONFIG:log.message.downconversion.enable=true}
>   segment.jitter.ms=0 sensitive=false synonyms={}
>   cleanup.policy=delete sensitive=false synonyms={DEFAULT_CONFIG:log.cleanup.policy=delete}
>   flush.ms=9223372036854775807 sensitive=false synonyms={}
>   follower.replication.throttled.replicas= sensitive=false synonyms={}
>   segment.bytes=1073741824 sensitive=false synonyms={DEFAULT_CONFIG:log.segment.bytes=1073741824}
>   retention.ms=-1 sensitive=false synonyms={}
>   flush.messages=9223372036854775807 sensitive=false synonyms={DEFAULT_CONFIG:log.flush.interval.messages=9223372036854775807}
>   message.format.version=2.6-IV0 sensitive=false synonyms={STATIC_BROKER_CONFIG:log.message.format.version=2.6, DEFAULT_CONFIG:log.message.format.version=2.6-IV0}
>   max.compaction.lag.ms=9223372036854775807 sensitive=false synonyms={DEFAULT_CONFIG:log.cleaner.max.compaction.lag.ms=9223372036854775807}
>   file.delete.delay.ms=60000 sensitive=false synonyms={DEFAULT_CONFIG:log.segment.delete.delay.ms=60000}
>   max.message.bytes=1572864 sensitive=false synonyms={DYNAMIC_BROKER_CONFIG:message.max.bytes=1572864, STATIC_BROKER_CONFIG:message.max.bytes=1572864, DEFAULT_CONFIG:message.max.bytes=1048588}
>   min.compaction.lag.ms=0 sensitive=false synonyms={DEFAULT_CONFIG:log.cleaner.min.compaction.lag.ms=0}
>   message.timestamp.type=CreateTime sensitive=false synonyms={DEFAULT_CONFIG:log.message.timestamp.type=CreateTime}
>   preallocate=false sensitive=false synonyms={DEFAULT_CONFIG:log.preallocate=false}
>   index.interval.bytes=4096 sensitive=false synonyms={DEFAULT_CONFIG:log.index.interval.bytes=4096}
>   min.cleanable.dirty.ratio=0.5 sensitive=false synonyms={DEFAULT_CONFIG:log.cleaner.min.cleanable.ratio=0.5}
>   unclean.leader.election.enable=false sensitive=false synonyms={DEFAULT_CONFIG:unclean.leader.election.enable=false}
>   retention.bytes=-1 sensitive=false synonyms={DEFAULT_CONFIG:log.retention.bytes=-1}
>   delete.retention.ms=86400000 sensitive=false synonyms={DEFAULT_CONFIG:log.cleaner.delete.retention.ms=86400000}
>   segment.ms=604800000 sensitive=false synonyms={}
>   message.timestamp.difference.max.ms=9223372036854775807 sensitive=false synonyms={DEFAULT_CONFIG:log.message.timestamp.difference.max.ms=9223372036854775807}
>   segment.index.bytes=10485760 sensitive=false synonyms={DEFAULT_CONFIG:log.index.size.max.bytes=10485760}{code}
> With log.retention.hours being set to -1, it was expected the log cleanup to not trigger but Kafka still used the default configuration of 7 days retention and cleaned up the segments. The cleanups are still happening as we have not changed the configurations to debug and find the cause. Let me know if you need more details.
> Please look into this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)