You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Mike Kaplinskiy <mi...@ladderlife.com.INVALID> on 2019/06/08 19:44:10 UTC

Parallel computation of windows in Flink

Hi everyone,

I’m using a Kafka source with a lot of watermark skew (i.e. new partitions
were added to the topic over time). The sink is a
FileIO.Write().withNumShards(1) to get ~ 1 file per day & an early trigger
to write at most 40,000 records per file. Unfortunately it looks like
there's 1 thread trying to write files for all the various days, instead of
writing multiple days' files in parallel. Is there anything I could do here
to parallelize the process? All of this is with the Flink runner.

Mike.

Ladder <http://bit.ly/1VRtWfS>. The smart, modern way to insure your life.

Re: Parallel computation of windows in Flink

Posted by Mike Kaplinskiy <mi...@ladderlife.com.INVALID>.
Sorry about that - this is definitely the wrong list. I meant to send that
to the apache beam list.

Ladder <http://bit.ly/1VRtWfS>. The smart, modern way to insure your life.


On Sat, Jun 8, 2019 at 12:44 PM Mike Kaplinskiy <mi...@ladderlife.com> wrote:

> Hi everyone,
>
> I’m using a Kafka source with a lot of watermark skew (i.e. new partitions
> were added to the topic over time). The sink is a
> FileIO.Write().withNumShards(1) to get ~ 1 file per day & an early trigger
> to write at most 40,000 records per file. Unfortunately it looks like
> there's 1 thread trying to write files for all the various days, instead of
> writing multiple days' files in parallel. Is there anything I could do here
> to parallelize the process? All of this is with the Flink runner.
>
> Mike.
>
> Ladder <http://bit.ly/1VRtWfS>. The smart, modern way to insure your life.
>