You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Alexis Sarda-Espinosa (Jira)" <ji...@apache.org> on 2024/03/04 09:32:00 UTC

[jira] [Commented] (FLINK-34400) Kafka sources with watermark alignment sporadically stop consuming

    [ https://issues.apache.org/jira/browse/FLINK-34400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823102#comment-17823102 ] 

Alexis Sarda-Espinosa commented on FLINK-34400:
-----------------------------------------------

Hi [~fanrui], sorry for the late reply, it slipped my mind. I did try both approaches back then (with and without idleness), my point was that disabling idleness was behaving strangely: I also expect the quick topic to be blocked by the slow (empty) topic, but in my experiments this didn't happen consistenly, so consumption was unblocked sometimes for unknown reasons.

In any case, I imagine this will be an unsupported configuration scenario with a somewhat undefined behavior.

> Kafka sources with watermark alignment sporadically stop consuming
> ------------------------------------------------------------------
>
>                 Key: FLINK-34400
>                 URL: https://issues.apache.org/jira/browse/FLINK-34400
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 1.18.1
>            Reporter: Alexis Sarda-Espinosa
>            Priority: Major
>         Attachments: alignment_lags.png, logs.txt
>
>
> I have 2 Kafka sources that read from different topics. I have assigned them to the same watermark alignment group, and I have _not_ enabled idleness explicitly in their watermark strategies. One topic remains pretty much empty most of the time, while the other receives a few events per second all the time. Parallelism of the active source is 2, for the other one it's 1, and checkpoints are once every minute.
> This works correctly for some time (10 - 15 minutes in my case) but then 1 of the active sources stops consuming, which causes lag to increase. Weirdly, after another 15 minutes or so, all the backlog is consumed at once, and then everything stops again.
> I'm attaching some logs from the Task Manager where the issue appears. You will notice that the Kafka network client reports disconnections (a long time after the deserializer stopped reporting that events were being consumed), I'm not sure if this is related.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)