You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Arti Pande <pa...@gmail.com> on 2020/09/14 15:33:56 UTC

Parallelism of Keyed Process Function

Hi,

Here is a question related to parallelism of keyed-process-function that is
applied to the KeyedStream. For some code that looks like this

myStream.keyBy(...)
    .process(new MyKeyedProcessFunction())

    .process(<someOtherProcessFunction>).setParallelism(10)


On a Flink cluster with 5 TM nodes each with 10 task slots, and Job
parallelism = 5, the 5 subtasks of MyKeyedProcessFunction() do not get
distributed across all the 5 TM nodes evenly. Those typically get assigned
to one single TM node. However the 50 subtasks of <someOtherProcessFunction>
are always spread evenly across the 5 TMs.

Am I missing something? How can I get those MyKeyedProcessFunction()
subtasks distributed across all TM nodes evenly?

Thanks
Arti

Re: Parallelism of Keyed Process Function

Posted by Arvid Heise <ar...@ververica.com>.
Hi Arti,

This is nothing specific to KeyedProcessFunction, but the general way Flink
distributes subtasks. The general idea is to use as few task managers as
possible such that they are available for cluster downsizing or other
concurrent jobs.

You can change this behavior through cluster.evenly-spread-out-slots
configuration [1].

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/config.html#cluster-evenly-spread-out-slots



On Mon, Sep 14, 2020 at 5:33 PM Arti Pande <pa...@gmail.com> wrote:

> Hi,
>
> Here is a question related to parallelism of keyed-process-function that
> is applied to the KeyedStream. For some code that looks like this
>
> myStream.keyBy(...)
>     .process(new MyKeyedProcessFunction())
>
>     .process(<someOtherProcessFunction>).setParallelism(10)
>
>
> On a Flink cluster with 5 TM nodes each with 10 task slots, and Job
> parallelism = 5, the 5 subtasks of MyKeyedProcessFunction() do not get
> distributed across all the 5 TM nodes evenly. Those typically get assigned
> to one single TM node. However the 50 subtasks of <
> someOtherProcessFunction> are always spread evenly across the 5 TMs.
>
> Am I missing something? How can I get those MyKeyedProcessFunction()
> subtasks distributed across all TM nodes evenly?
>
> Thanks
> Arti
>


-- 

Arvid Heise | Senior Java Developer

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng