You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Martijn Visser (Jira)" <ji...@apache.org> on 2022/03/10 14:03:00 UTC

[jira] [Updated] (FLINK-26018) Unnecessary late events when using the new KafkaSource

     [ https://issues.apache.org/jira/browse/FLINK-26018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn Visser updated FLINK-26018:
-----------------------------------
    Affects Version/s: 1.14.4

> Unnecessary late events when using the new KafkaSource
> ------------------------------------------------------
>
>                 Key: FLINK-26018
>                 URL: https://issues.apache.org/jira/browse/FLINK-26018
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Common
>    Affects Versions: 1.14.3, 1.14.4
>            Reporter: Jun Qin
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.15.0, 1.14.5
>
>         Attachments: message in kafka.txt, taskmanager_10.28.0.131_33249-b3370c_log
>
>
> There is an issue with the new KafkaSource connector in Flink 1.14: when one task consumes messages from multiple topic partitions (statically created, timestamp are in order), it may start with one partition and advances watermarks before the data from other partitions come. In this case, the early messages in other partitions may unnecessarily be considered  as late ones.
> I discussed with [~renqs], it seems that the new KafkaSource only adds a partition into {{WatermarkMultiplexer}} when it receives data from that partition. In contrast, FlinkKafkaConsumer adds all known partition before it fetch any data. 
> Attached two files: the messages in Kafka and the corresponding TM logs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)