You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jason Gustafson (JIRA)" <ji...@apache.org> on 2017/06/21 18:55:00 UTC

[jira] [Created] (KAFKA-5490) Deletion of tombstones during cleaning should consider idempotent message retention

Jason Gustafson created KAFKA-5490:
--------------------------------------

             Summary: Deletion of tombstones during cleaning should consider idempotent message retention
                 Key: KAFKA-5490
                 URL: https://issues.apache.org/jira/browse/KAFKA-5490
             Project: Kafka
          Issue Type: Sub-task
            Reporter: Jason Gustafson
            Assignee: Jason Gustafson
            Priority: Critical
             Fix For: 0.11.0.1


The LogCleaner always preserves the message containing last sequence from a given ProducerId when doing a round of cleaning. This is necessary to ensure that the producer is not prematurely evicted which would cause an OutOfOrderSequenceException. The problem with this approach is that the preserved message won't be considered again for cleaning until a new message with the same key is written to the topic. Generally this could result in accumulation of stale entries in the log, but the bigger problem is that the newer entry with the same key could be a tombstone. If we end up deleting this tombstone before a new record with the same key is written, then the old entry will resurface. For example, suppose the following sequence of writes:

1. ProducerId=1, Key=A, Value=1
2. ProducerId=2, Key=A, Value=null (tombstone)

We will preserve the first entry indefinitely until a new record with Key=A is written AND either ProducerId 1 has written a newer record with a larger sequence number or ProducerId 1 becomes expired. As long as the tombstone is preserved, there is no correctness violation: a consumer reading from the beginning will ignore the first entry after reading the tombstone. But it is possible that the tombstone entry will be removed from the log before a new record with Key=A is written. If that happens, then a consumer reading from the beginning would incorrectly observe the overwritten value.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)