You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Fabian Hueske <fh...@gmail.com> on 2018/05/02 09:43:06 UTC

Re: coordinate watermarks between jobs?

Hi Tao,

The watermarks of operators that consume from two (or more) streams are
always synced to the lowest watermark.
This behavior guarantees that data won't be late (unless it was late when
watermarks were assigned). However, the operator will most likely need to
buffer more events from the "faster" streams.

Right now, it is not possible to throttle faster streams to the pace of the
slowest stream.

Best, Fabian

2018-04-27 1:05 GMT+02:00 Tao Xia <ta...@udacity.com>:

> Hi All,
>   I am trying to reply events from 3 different sources and hopefully in
> time sequence, say Stream1, Stream2, Stream3. Since their size vary a lot,
> the watermarks on one stream is much faster than other streams.  Is there
> any way to coordinate the watermarks between different input streams.
> Thanks,
> Tao
>

Re: coordinate watermarks between jobs?

Posted by Eron Wright <er...@gmail.com>.
It might be possible to apply backpressure to the channels that are
significantly ahead in event time.  Tao, it would not be trivial, but if
you'd like to investigate more deeply, take a look at the Flink runtime's
`StatusWatermarkValve` and the associated stream input processors to see
how an operator integrates incoming watermarks.   A key challenge would be
to apply backpressure to the upstream channel for reasons other than the
availability of network buffers.  Take a look at FLINK-7282 which
introduced a credit system that may be useful here.

On Fri, May 4, 2018 at 10:07 AM, Tao Xia <ta...@udacity.com> wrote:

> Without throttle, it will eventually ran out of memory.
> I think this is a very common use case for Flink users during stream
> replay or re-process.
> Do we have anything feature planed for it? Would like to contribute on the
> initiative.
>
> On Wed, May 2, 2018 at 2:43 AM, Fabian Hueske <fh...@gmail.com> wrote:
>
>> Hi Tao,
>>
>> The watermarks of operators that consume from two (or more) streams are
>> always synced to the lowest watermark.
>> This behavior guarantees that data won't be late (unless it was late when
>> watermarks were assigned). However, the operator will most likely need to
>> buffer more events from the "faster" streams.
>>
>> Right now, it is not possible to throttle faster streams to the pace of
>> the slowest stream.
>>
>> Best, Fabian
>>
>> 2018-04-27 1:05 GMT+02:00 Tao Xia <ta...@udacity.com>:
>>
>>> Hi All,
>>>   I am trying to reply events from 3 different sources and hopefully in
>>> time sequence, say Stream1, Stream2, Stream3. Since their size vary a lot,
>>> the watermarks on one stream is much faster than other streams.  Is there
>>> any way to coordinate the watermarks between different input streams.
>>> Thanks,
>>> Tao
>>>
>>
>>
>

Re: coordinate watermarks between jobs?

Posted by Tao Xia <ta...@udacity.com>.
Without throttle, it will eventually ran out of memory.
I think this is a very common use case for Flink users during stream replay
or re-process.
Do we have anything feature planed for it? Would like to contribute on the
initiative.

On Wed, May 2, 2018 at 2:43 AM, Fabian Hueske <fh...@gmail.com> wrote:

> Hi Tao,
>
> The watermarks of operators that consume from two (or more) streams are
> always synced to the lowest watermark.
> This behavior guarantees that data won't be late (unless it was late when
> watermarks were assigned). However, the operator will most likely need to
> buffer more events from the "faster" streams.
>
> Right now, it is not possible to throttle faster streams to the pace of
> the slowest stream.
>
> Best, Fabian
>
> 2018-04-27 1:05 GMT+02:00 Tao Xia <ta...@udacity.com>:
>
>> Hi All,
>>   I am trying to reply events from 3 different sources and hopefully in
>> time sequence, say Stream1, Stream2, Stream3. Since their size vary a lot,
>> the watermarks on one stream is much faster than other streams.  Is there
>> any way to coordinate the watermarks between different input streams.
>> Thanks,
>> Tao
>>
>
>