You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Marco Villalobos <mv...@kineteque.com> on 2021/02/05 11:06:01 UTC

threading and distribution

as data flows from a source through a pipeline of operators and finally
sinks, is there a means to control how many threads are used within an
operator, and how an operator is distributed across the network?

Where can I read up on these types of details specifically?

Re: threading and distribution

Posted by Matthias Pohl <ma...@ververica.com>.
Hi Marco,
sorry for the late reply. The documentation you found [1] is already a good
start. You can define how many subtasks of an operator run in parallel
using the operator's parallelism configuration [2]. Each operator's subtask
will run in a separate task slot. There's the concept of slot sharing as
described in [3] which enables Flink to run subtasks of different operators
of the same job in the same slot. This enables the TaskManager to run an
entire pipeline in a single slot [3].
The maximum parallelism of your job is bound by the number of available
task slots in the Flink cluster which can be defined through the number of
slots per TaskManager [4][5] and the number of TaskManagers running in your
Flink cluster (taskmanager.numberOfTaskSlots * #taskmanagers = maximum
possible parallelism for an operator/pipeline).

I hope this was still helpful.

Best,
Matthias

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.12/concepts/flink-architecture.html
[2]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html#:~:text=A%20Flink%20program%20consists%20of,task%20is%20called%20its%20parallelism
.
[3]
https://ci.apache.org/projects/flink/flink-docs-stable/concepts/flink-architecture.html#task-slots-and-resources
[4]
https://ci.apache.org/projects/flink/flink-docs-stable/deployment/config.html
[5]
https://ci.apache.org/projects/flink/flink-docs-stable/deployment/config.html#taskmanager-numberoftaskslots

On Fri, Feb 5, 2021 at 12:22 PM Marco Villalobos <mv...@kineteque.com>
wrote:

> Okay, I am following up to my question. I see information regarding the
> threading and distribution model on the documentation about the
> architecture.
>
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/concepts/flink-architecture.html
>
> Next, I want to read up on what I have control over.
>
> On Fri, Feb 5, 2021 at 3:06 AM Marco Villalobos <mv...@kineteque.com>
> wrote:
>
>> as data flows from a source through a pipeline of operators and finally
>> sinks, is there a means to control how many threads are used within an
>> operator, and how an operator is distributed across the network?
>>
>> Where can I read up on these types of details specifically?
>>
>

Re: threading and distribution

Posted by Marco Villalobos <mv...@kineteque.com>.
Okay, I am following up to my question. I see information regarding the
threading and distribution model on the documentation about the
architecture.

https://ci.apache.org/projects/flink/flink-docs-release-1.12/concepts/flink-architecture.html

Next, I want to read up on what I have control over.

On Fri, Feb 5, 2021 at 3:06 AM Marco Villalobos <mv...@kineteque.com>
wrote:

> as data flows from a source through a pipeline of operators and finally
> sinks, is there a means to control how many threads are used within an
> operator, and how an operator is distributed across the network?
>
> Where can I read up on these types of details specifically?
>