You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Adam Venger <ve...@gmail.com> on 2020/09/03 13:28:42 UTC

Combined streams backpressure

Hi.
I'm thinking about a solution to a problem I have. I need to create keyed
session windows from multiple streams of data. Combining streams is done by
watermarks. The problem is, one of the streams can be slower. This opens
too many windows that wait for the stream to catch up, which wastes
resources, slows down checkpointing and access. I was thinking about some
way to detect this and stop reading from the faster streams which should
create backpressure. Is it something that is possible now? If not, would
there be any interest in adding this feature by me? If it would be a more
complex problem, I would like to make a master thesis form it.

Adam

Re: Combined streams backpressure

Posted by Piotr Nowojski <pn...@apache.org>.
Hi,

This is a known problem. As of recently, there was no way to solve this
problem generically, for every source. This is changing now, as one of the
motivations behind FLIP-27, was to actually address this issue [1]. Note,
as of now, there are no FLIP-27 sources yet in the code base, but for Flink
1.12 we are planning to change this.

For older Flink versions some users are monitoring this even time spread
and either back pressuring or slowing down the faster sources. Either via
some sleeping mapper, or via changing the source function itself (the
latter is better, but more complicated).

If you would like to help this effort, I would first suggest to take a look
at the FLIP-27 document and then try to get in touch with the people
involved with it to align the efforts. It would be a much welcomed feature.

Best,
Piotrek

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface#FLIP27:RefactorSourceInterface-EventTimeAlignment

czw., 3 wrz 2020 o 15:28 Adam Venger <ve...@gmail.com> napisaƂ(a):

> Hi.
> I'm thinking about a solution to a problem I have. I need to create keyed
> session windows from multiple streams of data. Combining streams is done by
> watermarks. The problem is, one of the streams can be slower. This opens
> too many windows that wait for the stream to catch up, which wastes
> resources, slows down checkpointing and access. I was thinking about some
> way to detect this and stop reading from the faster streams which should
> create backpressure. Is it something that is possible now? If not, would
> there be any interest in adding this feature by me? If it would be a more
> complex problem, I would like to make a master thesis form it.
>
> Adam
>