You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Igor Baltiyskiy (Jira)" <ji...@apache.org> on 2020/04/23 20:40:00 UTC

[jira] [Created] (KAFKA-9912) Kafka deletes full log on startup written 30mins ago - wrongly claims retention time 7d breach

Igor Baltiyskiy created KAFKA-9912:
--------------------------------------

             Summary: Kafka deletes full log on startup written 30mins ago - wrongly claims retention time 7d breach
                 Key: KAFKA-9912
                 URL: https://issues.apache.org/jira/browse/KAFKA-9912
             Project: Kafka
          Issue Type: Bug
          Components: log cleaner
    Affects Versions: 2.1.1
         Environment: Image solsson/kafka:2.1.1@sha256:8bc8242c649c395ab79d76cc83b1052e63b4efea7f83547bf11eb3ef5ea6f8e1 deployed in Kubernetes v 1.12.7 (minikube v1.4.0)
1 broker
            Reporter: Igor Baltiyskiy
         Attachments: kafka-logs.txt

This isn't easily reproduced, so bear with me.

I was experimenting with the resilience of my Kafka-based app by shutting down Kafka via {{kubectl scale --replicas 0}} — as far as I understand, it sends {{SIGTERM}} to PID 1 of this container, which is {{java ... kafka.Kafka}}. I've got it into state where {{kafka-console-consumer.sh}} cannot read anything from some topics, but can from some others.

So, this works:
{code:java}
bin/kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic increasing-topic-0  --partition 0 --from-beginning --max-messages 5
1
2
3
4
5
Processed a total of 5 messages {code}
And this command just hangs:
{code:java}
bin/kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic increasing-topic-7  --partition 0 --from-beginning --max-messages 5 {code}
Even though I know there should be plenty of messages in the topic, produced not more than an hour ago:
{code:java}
root@kafka-client-solsson:/opt/kafka# bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list kafka:9092 --topic increasing-topic-0
increasing-topic-0:0:500000
root@kafka-client-solsson:/opt/kafka# bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list kafka:9092 --topic increasing-topic-7
increasing-topic-7:0:321455 {code}
What I see in the broker logs:
{code:java}
[2020-04-23 19:20:10,706] INFO [Log partition=increasing-topic-7-0, dir=/var/lib/kafka/data/topics] Found deletable segments with base offsets [0] due to retention time 604800000ms breach (kafka.log.Log)
[2020-04-23 19:20:10,712] INFO [Log partition=increasing-topic-7-0, dir=/var/lib/kafka/data/topics] Rolled new log segment at offset 321455 in 5 ms. (kafka.log.Log)
[2020-04-23 19:20:10,712] INFO [Log partition=increasing-topic-7-0, dir=/var/lib/kafka/data/topics] Scheduling log segment [baseOffset 0, size 4513814] for deletion. (kafka.log.Log)
[2020-04-23 19:20:10,712] INFO [Log partition=increasing-topic-7-0, dir=/var/lib/kafka/data/topics] Incrementing log start offset to 321455 (kafka.log.Log)
[2020-04-23 19:21:10,712] INFO [Log partition=increasing-topic-7-0, dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)
[2020-04-23 19:21:10,713] INFO Deleted log /var/lib/kafka/data/topics/increasing-topic-7-0/00000000000000000000.log.deleted. (kafka.log.LogSegment)
[2020-04-23 19:21:10,713] INFO Deleted offset index /var/lib/kafka/data/topics/increasing-topic-7-0/00000000000000000000.index.deleted. (kafka.log.LogSegment)
[2020-04-23 19:21:10,713] INFO Deleted time index /var/lib/kafka/data/topics/increasing-topic-7-0/00000000000000000000.timeindex.deleted. (kafka.log.LogSegment){code}
As you see, Kafka deletes up to 321455 messages — exactly how many there are in this topic.

I wouldn't expect Kafka to do that. 

Note that in my test, I ran a script that would switch Kafka on and then after a random period of time off. Maybe that was what caused this bug to happen.

For the record, that's how much timed passed between on/off switches:
{code:java}
sleeping 29575 ms before switching Kafka off
statefulset.apps/kafka-a scaled
Kafka switched off
sleeping 257347 ms before switching Kafka on
statefulset.apps/kafka-a scaled
kafka switched on
sleeping 6388 ms before switching Kafka off
statefulset.apps/kafka-a scaled
Kafka switched off
sleeping 214066 ms before switching Kafka on
statefulset.apps/kafka-a scaled
kafka switched on
sleeping 1095 ms before switching Kafka off
statefulset.apps/kafka-a scaled
Kafka switched off
sleeping 155721 ms before switching Kafka on
statefulset.apps/kafka-a scaled
kafka switched on
sleeping 8689 ms before switching Kafka off
statefulset.apps/kafka-a scaled
Kafka switched off
sleeping 188854 ms before switching Kafka on
statefulset.apps/kafka-a scaled
kafka switched on
sleeping 39113 ms before switching Kafka off{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)