You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "VIkram (Jira)" <ji...@apache.org> on 2020/04/16 11:28:00 UTC

[jira] [Updated] (KAFKA-9879) How kafka deletes tombstone messages?

     [ https://issues.apache.org/jira/browse/KAFKA-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

VIkram updated KAFKA-9879:
--------------------------
    Description: 
I was able to delete records in kafka using tombstone messages after few attempts. However the algorithm (or logic) that kafka uses to delete these tombstone messages is still unclear to me.

From my observations, I could figure out that there is some relation between last modified time of a segment and deletion of tombstone messages. I have looked into this [https://stackoverflow.com/a/48325699/6940279] but it's a bit complex to understand.

*Topic details*

 

{{Topic:reddyvel_13 PartitionCount:1 ReplicationFactor:3 Configs:cleanup.policy=compact,segment.bytes=2000000,delete.retention.ms=10000
 Topic: reddyvel_13 Partition: 0 Leader: 1 Replicas: 1,5,2 Isr: 1,5,2}}

I have set {{cleanup.policy=compact}}, {{segment.bytes=2000000}}, {{delete.retention.ms=10000}}

*Timeline of events*
 * First segment (baseOffset = 0) was closed at {{2020-04-02 07:12:09,908}}

+*cleaner log*+

{{[2020-04-02 07:12:09,908] INFO [ProducerStateManager partition=reddyvel_13-0] Writing producer snapshot at offset 16623 (kafka.log.ProducerStateManager)
 [2020-04-02 07:12:09,908] INFO [Log partition=reddyvel_13-0, dir=/local/kafka/data] Rolled new log segment at offset 16623 in 1 ms. (kafka.log.Log)}}

Compaction has been triggered immediately on this closed segment

+*cleaner log*+ 

{{[2020-04-02 07:12:22,989] INFO Cleaner 0: Cleaning log reddyvel_13-0 (cleaning prior to Thu Apr 02 07:12:09 EDT 2020, discarding tombstones prior to Wed Dec 31 19:00:00 EST 1969)... (kafka.l
 og.LogCleaner)}}

 
 * Sent few more messages along with few tombstones (to delete messages present in first segment) and Second segment was closed at {{2020-04-02 07:56:50,405}}

 +*cleaner log*+

{{[2020-04-02 07:56:50,405] INFO [ProducerStateManager partition=reddyvel_13-0] Writing producer snapshot at offset 33868 (kafka.log.ProducerStateManager)
 [2020-04-02 07:56:50,406] INFO [Log partition=reddyvel_13-0, dir=/local/kafka/data] Rolled new log segment at offset 33868 in 2 ms. (kafka.log.Log)}}

Compaction has been triggered

+*cleaner log*+ 

{{[2020-04-02 07:56:53,180] INFO Cleaner 0: Cleaning log reddyvel_13-0 (cleaning prior to Thu Apr 02 07:56:50 EDT 2020, discarding tombstones prior to Thu Apr 02 07:11:59 EDT 2020)... (kafka.l
 og.LogCleaner)}}

 

Here, above log message says {{discarding tombstones prior to Thu Apr 02 07:11:59 EDT 2020}}. This timestamp is exactly equal to first segment closing timestamp ({{2020-04-02 07:12:09,908}})   -   {{delete.retention.ms}} (10 seconds) of my topic. I'm not able to figure out the link between these.

I want to understand at what time does kafka trigger deletion of tombstone messages. Can someone explain the tombstone deletion algorithm in simpler terms and the reasoning behind it?

 

It's not a bug but I need more information on this. I have posted this in other forums like stackoverflow but did not get any reply. The kafka official documentation doesn't have this information. If this is not the correct platform for this, kindly guide me to the relevant platform.

 

Thanks in advance.

  was:
I was able to delete records in kafka using tombstone messages after few attempts. However the algorithm (or logic) that kafka uses to delete these tombstone messages is still unclear to me.

From my observations, I could figure out that there is some relation between last modified time of a segment and deletion of tombstone messages. I have looked into this [https://stackoverflow.com/a/48325699/6940279] but it's a bit complex to understand.

*Topic details*

 

{{Topic:reddyvel_13   PartitionCount:1    ReplicationFactor:3 Configs:cleanup.policy=compact,segment.bytes=2000000,delete.retention.ms=10000
    Topic: reddyvel_13  Partition: 0    Leader: 1   Replicas: 1,5,2 Isr: 1,5,2}}

I have set {{cleanup.policy=compact}}, {{segment.bytes=2000000}}, {{delete.retention.ms=10000}}

*Timeline of events*
 * First segment (baseOffset = 0) was closed at {{2020-04-02 07:12:09,908}}

 

{{[2020-04-02 07:12:09,908] INFO [ProducerStateManager partition=reddyvel_13-0] Writing producer snapshot at offset 16623 (kafka.log.ProducerStateManager)
[2020-04-02 07:12:09,908] INFO [Log partition=reddyvel_13-0, dir=/local/kafka/data] Rolled new log segment at offset 16623 in 1 ms. (kafka.log.Log)}}

Compaction has been triggered immediately on this closed segment

 

{{[2020-04-02 07:12:22,989] INFO Cleaner 0: Cleaning log reddyvel_13-0 (cleaning prior to Thu Apr 02 07:12:09 EDT 2020, discarding tombstones prior to Wed Dec 31 19:00:00 EST 1969)... (kafka.l
og.LogCleaner)}}
 * Sent few more messages along with few tombstones (to delete messages present in first segment) and Second segment was closed at {{2020-04-02 07:56:50,405}}

 

{{[2020-04-02 07:56:50,405] INFO [ProducerStateManager partition=reddyvel_13-0] Writing producer snapshot at offset 33868 (kafka.log.ProducerStateManager)
[2020-04-02 07:56:50,406] INFO [Log partition=reddyvel_13-0, dir=/local/kafka/data] Rolled new log segment at offset 33868 in 2 ms. (kafka.log.Log)}}

Compaction has been triggered

 

{{[2020-04-02 07:56:53,180] INFO Cleaner 0: Cleaning log reddyvel_13-0 (cleaning prior to Thu Apr 02 07:56:50 EDT 2020, discarding tombstones prior to Thu Apr 02 07:11:59 EDT 2020)... (kafka.l
og.LogCleaner)}}

Here, above log message says {{discarding tombstones prior to Thu Apr 02 07:11:59 EDT 2020}}. This timestamp is exactly equal to first segment closing timestamp ({{2020-04-02 07:12:09,908}})   -   {{delete.retention.ms}} (10 seconds) of my topic. I'm not able to figure out the link between these.

I want to understand at what time does kafka trigger deletion of tombstone messages. Can someone explain the tombstone deletion algorithm in simpler terms and the reasoning behind it?

 

It's not a bug but I need more information on this. I have posted this in other forums like stackoverflow but did not get any reply. The kafka official documentation doesn't have this information. If this is not the correct platform for this, kindly guide me to the relevant platform.

 

Thanks in advance.


> How kafka deletes tombstone messages?
> -------------------------------------
>
>                 Key: KAFKA-9879
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9879
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: VIkram
>            Priority: Major
>
> I was able to delete records in kafka using tombstone messages after few attempts. However the algorithm (or logic) that kafka uses to delete these tombstone messages is still unclear to me.
> From my observations, I could figure out that there is some relation between last modified time of a segment and deletion of tombstone messages. I have looked into this [https://stackoverflow.com/a/48325699/6940279] but it's a bit complex to understand.
> *Topic details*
>  
> {{Topic:reddyvel_13 PartitionCount:1 ReplicationFactor:3 Configs:cleanup.policy=compact,segment.bytes=2000000,delete.retention.ms=10000
>  Topic: reddyvel_13 Partition: 0 Leader: 1 Replicas: 1,5,2 Isr: 1,5,2}}
> I have set {{cleanup.policy=compact}}, {{segment.bytes=2000000}}, {{delete.retention.ms=10000}}
> *Timeline of events*
>  * First segment (baseOffset = 0) was closed at {{2020-04-02 07:12:09,908}}
> +*cleaner log*+
> {{[2020-04-02 07:12:09,908] INFO [ProducerStateManager partition=reddyvel_13-0] Writing producer snapshot at offset 16623 (kafka.log.ProducerStateManager)
>  [2020-04-02 07:12:09,908] INFO [Log partition=reddyvel_13-0, dir=/local/kafka/data] Rolled new log segment at offset 16623 in 1 ms. (kafka.log.Log)}}
> Compaction has been triggered immediately on this closed segment
> +*cleaner log*+ 
> {{[2020-04-02 07:12:22,989] INFO Cleaner 0: Cleaning log reddyvel_13-0 (cleaning prior to Thu Apr 02 07:12:09 EDT 2020, discarding tombstones prior to Wed Dec 31 19:00:00 EST 1969)... (kafka.l
>  og.LogCleaner)}}
>  
>  * Sent few more messages along with few tombstones (to delete messages present in first segment) and Second segment was closed at {{2020-04-02 07:56:50,405}}
>  +*cleaner log*+
> {{[2020-04-02 07:56:50,405] INFO [ProducerStateManager partition=reddyvel_13-0] Writing producer snapshot at offset 33868 (kafka.log.ProducerStateManager)
>  [2020-04-02 07:56:50,406] INFO [Log partition=reddyvel_13-0, dir=/local/kafka/data] Rolled new log segment at offset 33868 in 2 ms. (kafka.log.Log)}}
> Compaction has been triggered
> +*cleaner log*+ 
> {{[2020-04-02 07:56:53,180] INFO Cleaner 0: Cleaning log reddyvel_13-0 (cleaning prior to Thu Apr 02 07:56:50 EDT 2020, discarding tombstones prior to Thu Apr 02 07:11:59 EDT 2020)... (kafka.l
>  og.LogCleaner)}}
>  
> Here, above log message says {{discarding tombstones prior to Thu Apr 02 07:11:59 EDT 2020}}. This timestamp is exactly equal to first segment closing timestamp ({{2020-04-02 07:12:09,908}})   -   {{delete.retention.ms}} (10 seconds) of my topic. I'm not able to figure out the link between these.
> I want to understand at what time does kafka trigger deletion of tombstone messages. Can someone explain the tombstone deletion algorithm in simpler terms and the reasoning behind it?
>  
> It's not a bug but I need more information on this. I have posted this in other forums like stackoverflow but did not get any reply. The kafka official documentation doesn't have this information. If this is not the correct platform for this, kindly guide me to the relevant platform.
>  
> Thanks in advance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)