You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Jan Nitschke <ni...@web.de> on 2021/02/28 12:09:37 UTC

Independence of task parallelism

Hello,

We are working on a project where we want to gather information about the job performance across different task level parallelism settings.
Essentially, we want to see how the throughput of a single task varies across different parallelism settings, e.g. for a job of 5 tasks: 1-1-1-1-1 vs. 1-2-1-1-1 vs. 2-2-2-2-2.

We are running flink on Kubernetes, a job with 5 tasks, slot sharing is enabled, operator chasing is disabled and each task manager has one slot.

So, the number of task managers is always the number of the highest parallelism and wen can fit the entire job into one task manager slot.

We are then running the job against multiple parallelism configs (such as those above), collect the relevant metrics and try to get some useful information out of them.

We are now wondering how independent our results are from one another. More specifically, if we now look at the parallelism of the second task, is its performance independent of the parallelism of the other tasks? So, will a the second task perform the same in (1-2-1-1-1) as in (2-2-2-2-2)?

Our take on it is the following: With our setup, (1-2-1-1-1) should result in one task manager holding the entire job and a second task manager that only runs the second task. (2-2-2-2-2) will run two task managers with the entire job. So, theoretically, the second task should have much more resources available in the first setup as it has the entire resources of that task manager to its disposal. Does that assumption hold or will flink assign a certain amount of resources to a task in a task manager no matter how many other tasks are running on that same task manager slot?

We would highly appreciate any help.

Best,
Jan

Re: Re: Independence of task parallelism

Posted by Piotr Nowojski <pn...@apache.org>.

Yes, it might be the case. Hard to tell for sure without looking at the
job, metrics etc. Just be mindful of what I described, and if you want to
fine tune a job and set different parallelism values for different
operators, pay attention to where those operators are being distributed.
Usually in practice there is little reason to choose (1-2-1-1-1) over
(2-2-2-2-2). If you spread a load of some operators more than you need?
Usually not an issue. On the other hand with (2-2-2-2-2) you will spread
the load more evenly across task managers, which makes it easier to
tune/analyse/optimise.

Best,
Piotrek

pt., 5 mar 2021 o 09:10 Jan Nitschke <ni...@web.de> napisał(a):

> Hey Piotr,
>
> thanks for your answer, that makes perfect sense. However, when looking at
> the number of messages being processed, we can see that both subtasks on
> task 2 will produce the same amount of messages in the (1-2-1-1-1)
> scenario, even with the first task hitting backpressure. We assume that
> this has to do with the distribution of messages between task. As messages
> are being distributed equally among subtasks in our case, would this be an
> explanation for that behavior?
>
> Best,
> Jan
>
>
> *Gesendet:* Mittwoch, 03. März 2021 um 19:53 Uhr
> *Von:* "Piotr Nowojski" <pn...@apache.org>
> *An:* "Jan Nitschke" <ni...@web.de>
> *Cc:* "user" <us...@flink.apache.org>
> *Betreff:* Re: Independence of task parallelism
> Hi Jan,
>
> As far as I remember, Flink doesn't handle very well cases like
> (1-2-1-1-1) and two Task Managers. There are no guarantees how the
> operators/subtasks are going to be scheduled, but most likely it will be as
> you mentioned/observed. First task manager will be handling all of the
> operators, while the second task manager will only be running a single
> instance of the second operator (for load balancing reasons it would be
> better to spread the tasks across those two Task Managers more evenly).
>
> No, Flink doesn't hold any resources (appart of network buffers) per task.
> All of the available memory and CPU resources are shared across all of the
> running tasks. So in the (1-2-1-1-1) case, if the first task manager will
> be overloaded (for example if it has very few CPU cores), the second task
> will perform much better on the second task manager (which will be empty),
> causing a throughput skew. From this perspective, (2-2-2-2-2) would most
> likely be performing better, as the load would be more evenly spread.
>
> Piotrek
>
> niedz., 28 lut 2021 o 13:10 Jan Nitschke <ni...@web.de> napisał(a):
>
>> Hello,
>>
>> We are working on a project where we want to gather information about the
>> job performance across different task level parallelism settings.
>> Essentially, we want to see how the throughput of a single task varies
>> across different parallelism settings, e.g. for a job of 5 tasks: 1-1-1-1-1
>> vs. 1-2-1-1-1 vs. 2-2-2-2-2.
>>
>> *We are running flink on Kubernetes, a job with 5 tasks, slot sharing is
>> enabled, operator chasing is disabled and each task manager has one slot.*
>>
>> So, the number of task managers is always the number of the highest
>> parallelism and wen can fit the entire job into one task manager slot.
>>
>> We are then running the job against multiple parallelism configs (such as
>> those above), collect the relevant metrics and try to get some useful
>> information out of them.
>>
>> We are now wondering how independent our results are from one another.
>> More specifically, if we now look at the parallelism of the second task, is
>> its performance independent of the parallelism of the other tasks? So, will
>> a the second task perform the same in (1-2-1-1-1) as in (2-2-2-2-2)?
>>
>> Our take on it is the following: With our setup, (1-2-1-1-1) should
>> result in one task manager holding the entire job and a second task manager
>> that only runs the second task. (2-2-2-2-2) will run two task managers with
>> the entire job. So, theoretically, the second task should have much more
>> resources available in the first setup as it has the entire resources of
>> that task manager to its disposal. Does that assumption hold or will flink
>> assign a certain amount of resources to a task in a task manager no matter
>> how many other tasks are running on that same task manager slot?
>>
>> We would highly appreciate any help.
>>
>> Best,
>> Jan
>>
>

Aw: Re: Independence of task parallelism

Posted by Jan Nitschke <ni...@web.de>.

Hey Piotr,



thanks for your answer, that makes perfect sense. However, when looking at the
number of messages being processed, we can see that both subtasks on task 2
will produce the same amount of messages in the (1-2-1-1-1) scenario, even
with the first task hitting backpressure. We assume that this has to do with
the distribution of messages between task. As messages are being distributed
equally among subtasks in our case, would this be an explanation for that
behavior?



Best,  
Jan





**Gesendet:**  Mittwoch, 03. Marz 2021 um 19:53 Uhr  
**Von:**  "Piotr Nowojski" <pn...@apache.org>  
**An:**  "Jan Nitschke" <ni...@web.de>  
**Cc:**  "user" <us...@flink.apache.org>  
**Betreff:**  Re: Independence of task parallelism

Hi Jan,



As far as I remember, Flink doesn't handle very well cases like (1-2-1-1-1)
and two Task Managers. There are no guarantees how the operators/subtasks are
going to be scheduled, but most likely it will be as you mentioned/observed.
First task manager will be handling all of the operators, while the second
task manager will only be running a single instance of the second operator
(for load balancing reasons it would be better to spread the tasks across
those two Task Managers more evenly).



No, Flink doesn't hold any resources (appart of network buffers) per task. All
of the available memory and CPU resources are shared across all of the running
tasks. So in the (1-2-1-1-1) case, if the first task manager will be
overloaded (for example if it has very few CPU cores), the second task will
perform much better on the second task manager (which will be empty), causing
a throughput skew. From this perspective, (2-2-2-2-2) would most likely be
performing better, as the load would be more evenly spread.  



Piotrek



niedz., 28 lut 2021 o 13:10 Jan Nitschke
<[nitschkejan@web.de](mailto:nitschkejan@web.de)> napisał(a):

> Hello,

>

>  
>

> We are working on a project where we want to gather information about the
job performance across different task level parallelism settings.

>

> Essentially, we want to see how the throughput of a single task varies
across different parallelism settings, e.g. for a job of 5 tasks: 1-1-1-1-1
vs. 1-2-1-1-1 vs. 2-2-2-2-2.

>

>  
>

> _We are running flink on Kubernetes, a job with 5 tasks, slot sharing is
enabled, operator chasing is disabled and each task manager has one slot._

>

>  
>

> So, the number of task managers is always the number of the highest
parallelism and wen can fit the entire job into one task manager slot.

>

>  
>

> We are then running the job against multiple parallelism configs (such as
those above), collect the relevant metrics and try to get some useful
information out of them.

>

>  
>

> We are now wondering how independent our results are from one another. More
specifically, if we now look at the parallelism of the second task, is its
performance independent of the parallelism of the other tasks? So, will a the
second task perform the same in (1-2-1-1-1) as in (2-2-2-2-2)?

>

>  
>

> Our take on it is the following: With our setup, (1-2-1-1-1) should result
in one task manager holding the entire job and a second task manager that only
runs the second task. (2-2-2-2-2) will run two task managers with the entire
job. So, theoretically, the second task should have much more resources
available in the first setup as it has the entire resources of that task
manager to its disposal. Does that assumption hold or will flink assign a
certain amount of resources to a task in a task manager no matter how many
other tasks are running on that same task manager slot?

>

>  
>

> We would highly appreciate any help.

>

>  
>

> Best,

>

> Jan

Re: Independence of task parallelism

Posted by Piotr Nowojski <pn...@apache.org>.

Hi Jan,

As far as I remember, Flink doesn't handle very well cases like (1-2-1-1-1)
and two Task Managers. There are no guarantees how the operators/subtasks
are going to be scheduled, but most likely it will be as you
mentioned/observed. First task manager will be handling all of the
operators, while the second task manager will only be running a single
instance of the second operator (for load balancing reasons it would be
better to spread the tasks across those two Task Managers more evenly).

No, Flink doesn't hold any resources (appart of network buffers) per task.
All of the available memory and CPU resources are shared across all of the
running tasks. So in the (1-2-1-1-1) case, if the first task manager will
be overloaded (for example if it has very few CPU cores), the second task
will perform much better on the second task manager (which will be empty),
causing a throughput skew. From this perspective, (2-2-2-2-2) would most
likely be performing better, as the load would be more evenly spread.

Piotrek

niedz., 28 lut 2021 o 13:10 Jan Nitschke <ni...@web.de> napisał(a):

> Hello,
>
> We are working on a project where we want to gather information about the
> job performance across different task level parallelism settings.
> Essentially, we want to see how the throughput of a single task varies
> across different parallelism settings, e.g. for a job of 5 tasks: 1-1-1-1-1
> vs. 1-2-1-1-1 vs. 2-2-2-2-2.
>
> *We are running flink on Kubernetes, a job with 5 tasks, slot sharing is
> enabled, operator chasing is disabled and each task manager has one slot.*
>
> So, the number of task managers is always the number of the highest
> parallelism and wen can fit the entire job into one task manager slot.
>
> We are then running the job against multiple parallelism configs (such as
> those above), collect the relevant metrics and try to get some useful
> information out of them.
>
> We are now wondering how independent our results are from one another.
> More specifically, if we now look at the parallelism of the second task, is
> its performance independent of the parallelism of the other tasks? So, will
> a the second task perform the same in (1-2-1-1-1) as in (2-2-2-2-2)?
>
> Our take on it is the following: With our setup, (1-2-1-1-1) should result
> in one task manager holding the entire job and a second task manager that
> only runs the second task. (2-2-2-2-2) will run two task managers with the
> entire job. So, theoretically, the second task should have much more
> resources available in the first setup as it has the entire resources of
> that task manager to its disposal. Does that assumption hold or will flink
> assign a certain amount of resources to a task in a task manager no matter
> how many other tasks are running on that same task manager slot?
>
> We would highly appreciate any help.
>
> Best,
> Jan
>