You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Eron Wright (JIRA)" <ji...@apache.org> on 2017/08/25 02:10:00 UTC

[jira] [Commented] (FLINK-5479) Per-partition watermarks in FlinkKafkaConsumer should consider idle partitions

    [ https://issues.apache.org/jira/browse/FLINK-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141068#comment-16141068 ] 

Eron Wright  commented on FLINK-5479:
-------------------------------------

Thinking out loud here, consider making use of Kafka's `max.message.time.difference.ms` as the basis for idle timeout.

> Per-partition watermarks in FlinkKafkaConsumer should consider idle partitions
> ------------------------------------------------------------------------------
>
>                 Key: FLINK-5479
>                 URL: https://issues.apache.org/jira/browse/FLINK-5479
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kafka Connector
>            Reporter: Tzu-Li (Gordon) Tai
>             Fix For: 1.4.0
>
>
> Reported in ML: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-causes-watermark-not-being-emitted-td11008.html
> Similar to what's happening to idle sources blocking watermark progression in downstream operators (see FLINK-5017), the per-partition watermark mechanism in {{FlinkKafkaConsumer}} is also being blocked of progressing watermarks when a partition is idle. The watermark of idle partitions is always {{Long.MIN_VALUE}}, therefore the overall min watermark across all partitions of a consumer subtask will never proceed.
> It's normally not a common case to have Kafka partitions not producing any data, but it'll probably be good to handle this as well. I think we should have a localized solution similar to FLINK-5017 for the per-partition watermarks in {{AbstractFetcher}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)