You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Martijn Visser (Jira)" <ji...@apache.org> on 2024/01/26 08:34:00 UTC

[jira] [Updated] (FLINK-32019) EARLIEST offset strategy for partitions discoveried later based on FLIP-288

     [ https://issues.apache.org/jira/browse/FLINK-32019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn Visser updated FLINK-32019:
-----------------------------------
    Fix Version/s: kafka-3.1.0
                       (was: kafka-4.0.0)

> EARLIEST offset strategy for partitions discoveried later based on FLIP-288
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-32019
>                 URL: https://issues.apache.org/jira/browse/FLINK-32019
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Connectors / Kafka
>    Affects Versions: kafka-3.0.0
>            Reporter: Hongshun Wang
>            Assignee: Hongshun Wang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: kafka-3.1.0
>
>
> As described in [FLIP-288|https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source], the strategy used for new partitions is the same as the initial offset strategy, which is not reasonable.
> According to the semantics, if the startup strategy is latest, the consumed data should include all data from the moment of startup, which also includes all messages from new created partitions. However, the latest strategy currently maybe used for new partitions, leading to the loss of some data (thinking a new partition is created and might be discovered by Kafka source several minutes later, and the message produced into the partition within the gap might be dropped if we use for example "latest" as the initial offset strategy).if the data from all new partitions is not read, it does not meet the user's expectations.
> Other ploblems see final Section of [FLIP-288|https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source]: {{User specifies OffsetsInitializer for new partition}} .
> Therefore, it’s better to provide an *EARLIEST* strategy for later discovered partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)