You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Marco Villalobos <mv...@kineteque.com> on 2020/08/15 05:45:24 UTC

coordination of sinks

Given a source that goes into a tumbling window with a process function
that yields two side outputs, in addition to the main data stream, is it
possible to coordinate the order of completion
of sink 1, sink 2, and sink 3 as data leaves the tumbling window?

source -> tumbling window -> process function -> side output tag 1 -> sink
1                                               \-> side output tag 2 ->
sink 2
                                             \-> main stream -> sink 3


sink 1 will create partitions in PostgreSQL for me.
sink 2 will insert data into the partitioned table
sink 3 can happen in any order
but all of them need to finish before the next window fires.

Any advice will help.

Re: coordination of sinks

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Marco,

You cannot really synchronize data that is being emitted via different
streams (without bringing them together in an operator).

I see two options:

1) emit the event to create the partition and the data to be written into
the partition to the same stream. Flink guarantees that records do not
overtake records in the same partition. However, you need to ensure that
all records remain in the same partition, for example by partitioning on
the same ke.
2) emit the records to two different streams but have a CoProcessFunction
that processes the create partition and data events. The processing
function would just buffer the data events (in state) until it observes the
create partition event for which it creates the partitions (in a
synchronous fashion). Once the partition is created, it forwards all
buffered data and the remaining data.

Hope this helps,
Fabian

Am Sa., 15. Aug. 2020 um 07:45 Uhr schrieb Marco Villalobos <
mvillalobos@kineteque.com>:

> Given a source that goes into a tumbling window with a process function
> that yields two side outputs, in addition to the main data stream, is it
> possible to coordinate the order of completion
> of sink 1, sink 2, and sink 3 as data leaves the tumbling window?
>
> source -> tumbling window -> process function -> side output tag 1 ->
> sink 1                                               \-> side output tag 2
> -> sink 2
>                                              \-> main stream -> sink 3
>
>
> sink 1 will create partitions in PostgreSQL for me.
> sink 2 will insert data into the partitioned table
> sink 3 can happen in any order
> but all of them need to finish before the next window fires.
>
> Any advice will help.
>