You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Alex Sorokoumov (Jira)" <ji...@apache.org> on 2023/03/12 20:36:00 UTC

[jira] [Created] (FLINK-31408) Add EXACTLY_ONCE support to upsert-kafka

Alex Sorokoumov created FLINK-31408:
---------------------------------------

             Summary: Add EXACTLY_ONCE support to upsert-kafka
                 Key: FLINK-31408
                 URL: https://issues.apache.org/jira/browse/FLINK-31408
             Project: Flink
          Issue Type: New Feature
          Components: Connectors / Kafka
            Reporter: Alex Sorokoumov


{{upsert-kafka}} connector should support optional {{EXACTLY_ONCE}} delivery semantics.

[upsert-kafka docs|https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/upsert-kafka/#consistency-guarantees] suggest that the connector handles duplicate records from {{{}AT_LEAST_ONCE{}}}. However, at least 2 reasons exist to configure the connector with {{{}EXACTLY_ONCE{}}}.

First, there might be other non-Flink topic consumers that would rather not have duplicated records.

Second, multiple {{upsert-kafka}} producers might cause keys to roll back to previous values. Consider a scenario with 2 producing jobs A and B, writing to the same topic with {{AT_LEAST_ONCE}} and a consuming job reading from the topic. Both producers write unique, monotonically increasing sequences to the same key. Job A writes {{x=a1,a2,a3,a4,a5…}} Job B writes {{{}x=b1,b2,b3,b4,b5,...{}}}. With this setup, we can have the following sequence:
 # Job A produces x=a5.
 # Job B produces x=b5.
 # Job A produces the duplicate write x= 5.

The consuming job would observe {{x}} going to {{{}a5{}}}, then to {{{}b5{}}}, then back {{{}a5{}}}. {{EXACTLY_ONCE}} would prevent this behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)