You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Fabien LD (JIRA)" <ji...@apache.org> on 2018/05/18 05:27:00 UTC

[jira] [Created] (KAFKA-6915) MirrorMaker: avoid duplicates when source cluster is unreachable for more than session.timeout.ms

Fabien LD created KAFKA-6915:
--------------------------------

             Summary: MirrorMaker: avoid duplicates when source cluster is unreachable for more than session.timeout.ms
                 Key: KAFKA-6915
                 URL: https://issues.apache.org/jira/browse/KAFKA-6915
             Project: Kafka
          Issue Type: Improvement
    Affects Versions: 1.1.0
            Reporter: Fabien LD


According to doc, see [https://kafka.apache.org/11/documentation.html#semantics], the exactly-once delivery can be achieved by storing offsets in the same store as produced data:
{quote}
When writing to an external system, the limitation is in the need to coordinate the consumer's position with what is actually stored as output. The classic way of achieving this would be to introduce a two-phase commit between the storage of the consumer position and the storage of the consumers output. But this can be handled more simply and generally by letting the consumer store its offset in the same place as its output
{quote}

Indeed, with current implementation where the consumer stores the offsets in the source cluster, we can have duplicates if networks makes source cluster unreachable for more than {{session.timeout.ms}}.
Indeed, once that amount of time has passed, the source cluster will rebalance the consumer group and later, when network is back, the generation has changed and consumers cannot commit the offsets for the last batches of records consumed (actually all records processed during the last {{auto.commit.interval.ms}}). So all those records are processed again when consumers from group are coming back.

Storing the offsets in the target cluster would resolve this risk of duplicate records and would be a nice feature to have.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)