You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "VIkram (Jira)" <ji...@apache.org> on 2020/04/16 11:26:00 UTC

[jira] [Created] (KAFKA-9879) How kafka deletes tombstone messages?

VIkram created KAFKA-9879:
-----------------------------

             Summary: How kafka deletes tombstone messages?
                 Key: KAFKA-9879
                 URL: https://issues.apache.org/jira/browse/KAFKA-9879
             Project: Kafka
          Issue Type: Bug
            Reporter: VIkram


I was able to delete records in kafka using tombstone messages after few attempts. However the algorithm (or logic) that kafka uses to delete these tombstone messages is still unclear to me.

From my observations, I could figure out that there is some relation between last modified time of a segment and deletion of tombstone messages. I have looked into this [https://stackoverflow.com/a/48325699/6940279] but it's a bit complex to understand.

*Topic details*

 

{{Topic:reddyvel_13   PartitionCount:1    ReplicationFactor:3 Configs:cleanup.policy=compact,segment.bytes=2000000,delete.retention.ms=10000
    Topic: reddyvel_13  Partition: 0    Leader: 1   Replicas: 1,5,2 Isr: 1,5,2}}

I have set {{cleanup.policy=compact}}, {{segment.bytes=2000000}}, {{delete.retention.ms=10000}}

*Timeline of events*
 * First segment (baseOffset = 0) was closed at {{2020-04-02 07:12:09,908}}

 

{{[2020-04-02 07:12:09,908] INFO [ProducerStateManager partition=reddyvel_13-0] Writing producer snapshot at offset 16623 (kafka.log.ProducerStateManager)
[2020-04-02 07:12:09,908] INFO [Log partition=reddyvel_13-0, dir=/local/kafka/data] Rolled new log segment at offset 16623 in 1 ms. (kafka.log.Log)}}

Compaction has been triggered immediately on this closed segment

 

{{[2020-04-02 07:12:22,989] INFO Cleaner 0: Cleaning log reddyvel_13-0 (cleaning prior to Thu Apr 02 07:12:09 EDT 2020, discarding tombstones prior to Wed Dec 31 19:00:00 EST 1969)... (kafka.l
og.LogCleaner)}}
 * Sent few more messages along with few tombstones (to delete messages present in first segment) and Second segment was closed at {{2020-04-02 07:56:50,405}}

 

{{[2020-04-02 07:56:50,405] INFO [ProducerStateManager partition=reddyvel_13-0] Writing producer snapshot at offset 33868 (kafka.log.ProducerStateManager)
[2020-04-02 07:56:50,406] INFO [Log partition=reddyvel_13-0, dir=/local/kafka/data] Rolled new log segment at offset 33868 in 2 ms. (kafka.log.Log)}}

Compaction has been triggered

 

{{[2020-04-02 07:56:53,180] INFO Cleaner 0: Cleaning log reddyvel_13-0 (cleaning prior to Thu Apr 02 07:56:50 EDT 2020, discarding tombstones prior to Thu Apr 02 07:11:59 EDT 2020)... (kafka.l
og.LogCleaner)}}

Here, above log message says {{discarding tombstones prior to Thu Apr 02 07:11:59 EDT 2020}}. This timestamp is exactly equal to first segment closing timestamp ({{2020-04-02 07:12:09,908}})   -   {{delete.retention.ms}} (10 seconds) of my topic. I'm not able to figure out the link between these.

I want to understand at what time does kafka trigger deletion of tombstone messages. Can someone explain the tombstone deletion algorithm in simpler terms and the reasoning behind it?

 

It's not a bug but I need more information on this. I have posted this in other forums like stackoverflow but did not get any reply. The kafka official documentation doesn't have this information. If this is not the correct platform for this, kindly guide me to the relevant platform.

 

Thanks in advance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)