You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Igor Baltiyskiy (Jira)" <ji...@apache.org> on 2020/04/24 10:13:00 UTC

[jira] [Comment Edited] (KAFKA-9912) Kafka deletes full log on startup written 30mins ago - wrongly claims retention time 7d breach

    [ https://issues.apache.org/jira/browse/KAFKA-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091424#comment-17091424 ] 

Igor Baltiyskiy edited comment on KAFKA-9912 at 4/24/20, 10:12 AM:
-------------------------------------------------------------------

Hi,

I'm using Kafka Java client (Maven coordinates: {{org.apache.kafka:kafka-clients:2.4.0:jar}} ). I'm using {{ProducerRecord(String topic, V value)}} constructor for producing messages into these topics. As far as I see, it doesn't set timestamp.

I'm sure that the produced data must have correct timestamp. Though I can't show you the deleted logs, I have the following observations:
 # As naming suggests, there are 8 topics of this kind — {{increasing-topic-0}}, {{increasing-topic-1}}, up to {{increasing-topic-7}} (skipping {{increasing-topic-2}} because the test didn't go off as expected).  All of these topics have been produced during the same day, but only 1, 6, and 7 have been cleared, the rest are intact.
 #


was (Author: baltiyskiy):
Hi,

I'm using Kafka Java client (Maven coordinates: {{org.apache.kafka:kafka-clients:2.4.0:jar}} ). I'm using {{ProducerRecord(String topic, V value)}} constructor for producing messages into these topics. As far as I see, it doesn't set timestamp.

I'm sure that the produced data must have correct timestamp. Though I can't show you the deleted logs, I have the following observations:
 # As naming suggests, there are 8 topics of this kind — {{increasing-topic-0}}, {{increasing-topic-1}}, up to {{increasing-topic-7}} (ll of these topics have been produced during the same day, but only 1, 6, and 7 have been cleared, the rest are intact.
 #

> Kafka deletes full log on startup written 30mins ago - wrongly claims retention time 7d breach
> ----------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-9912
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9912
>             Project: Kafka
>          Issue Type: Bug
>          Components: log cleaner
>    Affects Versions: 2.1.1
>         Environment: Image solsson/kafka:2.1.1@sha256:8bc8242c649c395ab79d76cc83b1052e63b4efea7f83547bf11eb3ef5ea6f8e1 deployed in Kubernetes v 1.12.7 (minikube v1.4.0)
> 1 broker
>            Reporter: Igor Baltiyskiy
>            Priority: Major
>         Attachments: kafka-logs.txt
>
>
> This isn't easily reproduced, so I'll try to explain how I got there as accurately as possible. I've been able to observe this 2 times now.
> I was experimenting with the resilience of my Kafka-based app by shutting down Kafka via {{kubectl scale --replicas 0}} — as far as I understand, it sends {{SIGTERM}} to PID 1 of this container, which is {{java ... kafka.Kafka}}. I've got it into state where {{kafka-console-consumer.sh}} cannot read anything from some topics, but can from some others.
> So, this works:
> {code:java}
> bin/kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic increasing-topic-0  --partition 0 --from-beginning --max-messages 5
> 1
> 2
> 3
> 4
> 5
> Processed a total of 5 messages {code}
> And this command just hangs:
> {code:java}
> bin/kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic increasing-topic-7  --partition 0 --from-beginning --max-messages 5 {code}
> Even though I know there should be plenty of messages in the topic, produced not more than an hour ago:
> {code:java}
> root@kafka-client-solsson:/opt/kafka# bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list kafka:9092 --topic increasing-topic-0
> increasing-topic-0:0:500000
> root@kafka-client-solsson:/opt/kafka# bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list kafka:9092 --topic increasing-topic-7
> increasing-topic-7:0:321455 {code}
> What I see in the broker logs:
> {code:java}
> [2020-04-23 19:20:10,706] INFO [Log partition=increasing-topic-7-0, dir=/var/lib/kafka/data/topics] Found deletable segments with base offsets [0] due to retention time 604800000ms breach (kafka.log.Log)
> [2020-04-23 19:20:10,712] INFO [Log partition=increasing-topic-7-0, dir=/var/lib/kafka/data/topics] Rolled new log segment at offset 321455 in 5 ms. (kafka.log.Log)
> [2020-04-23 19:20:10,712] INFO [Log partition=increasing-topic-7-0, dir=/var/lib/kafka/data/topics] Scheduling log segment [baseOffset 0, size 4513814] for deletion. (kafka.log.Log)
> [2020-04-23 19:20:10,712] INFO [Log partition=increasing-topic-7-0, dir=/var/lib/kafka/data/topics] Incrementing log start offset to 321455 (kafka.log.Log)
> [2020-04-23 19:21:10,712] INFO [Log partition=increasing-topic-7-0, dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)
> [2020-04-23 19:21:10,713] INFO Deleted log /var/lib/kafka/data/topics/increasing-topic-7-0/00000000000000000000.log.deleted. (kafka.log.LogSegment)
> [2020-04-23 19:21:10,713] INFO Deleted offset index /var/lib/kafka/data/topics/increasing-topic-7-0/00000000000000000000.index.deleted. (kafka.log.LogSegment)
> [2020-04-23 19:21:10,713] INFO Deleted time index /var/lib/kafka/data/topics/increasing-topic-7-0/00000000000000000000.timeindex.deleted. (kafka.log.LogSegment){code}
> As you see, Kafka deletes up to 321455 messages — exactly how many there are in this topic.
> I wouldn't expect Kafka to do that. 
> Note that in my test, I ran a script that would switch Kafka on and then after a random period of time off. Maybe that was what caused this bug to happen.
> For the record, that's how much timed passed between on/off switches:
> {code:java}
> sleeping 29575 ms before switching Kafka off
> statefulset.apps/kafka-a scaled
> Kafka switched off
> sleeping 257347 ms before switching Kafka on
> statefulset.apps/kafka-a scaled
> kafka switched on
> sleeping 6388 ms before switching Kafka off
> statefulset.apps/kafka-a scaled
> Kafka switched off
> sleeping 214066 ms before switching Kafka on
> statefulset.apps/kafka-a scaled
> kafka switched on
> sleeping 1095 ms before switching Kafka off
> statefulset.apps/kafka-a scaled
> Kafka switched off
> sleeping 155721 ms before switching Kafka on
> statefulset.apps/kafka-a scaled
> kafka switched on
> sleeping 8689 ms before switching Kafka off
> statefulset.apps/kafka-a scaled
> Kafka switched off
> sleeping 188854 ms before switching Kafka on
> statefulset.apps/kafka-a scaled
> kafka switched on
> sleeping 39113 ms before switching Kafka off{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)