You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Martijn Visser (Jira)" <ji...@apache.org> on 2024/04/04 07:42:00 UTC

[jira] [Commented] (FLINK-34995) flink kafka connector source stuck when partition leader invalid

    [ https://issues.apache.org/jira/browse/FLINK-34995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833850#comment-17833850 ] 

Martijn Visser commented on FLINK-34995:
----------------------------------------

Why do you think this is a Flink bug, and not an issue on the Kafka side because you have no Kafka leader elected? 

> flink kafka connector source stuck when partition leader invalid
> ----------------------------------------------------------------
>
>                 Key: FLINK-34995
>                 URL: https://issues.apache.org/jira/browse/FLINK-34995
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Kafka
>    Affects Versions: 1.17.0, 1.19.0, 1.18.1
>            Reporter: yansuopeng
>            Priority: Major
>
> when partition leader invalid(leader=-1),  the flink streaming job using KafkaSource can't restart or start a new instance with a new groupid,  it will stuck and got following exception:
> "{*}org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before the position for partition aaa-1 could be determined{*}"
> when leader=-1,  kafka api like KafkaConsumer.position() will block until either the position could be determined or an unrecoverable error is encountered 
> infact,  leader=-1 not easy to avoid,  even replica=3, three disk offline together will trigger the problem, especially when the cluster size is relatively large.    it rely on kafka administrator to fix in time,  but it take risk when in kafka cluster peak period.
> I have solve this problem, and want to create a PR. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)