You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Luis Alves <lm...@gmail.com> on 2017/04/22 17:42:24 UTC

Resource overallocation

Hi,

Regarding the features on Flink that allow to optimize resource usage in the cluster (+ latency, throughput ...), i.e. slot sharing, task chaining, async i/o and dynamic scaling, I would like to ask the following questions (all in the stream processing context): 

In which cases would someone be interested in having the number of slots in a task manager higher than the number of cpu cores? 

In which case should we prefer split a pipeline of tasks over multiple slots (disable slot sharing), instead of increasing the parallelism, in order for an application to keep up with the incoming data rates? 

Is it possible that even when using all the features above, the resources reserved for a slot may be higher than the amount of resources that all the tasks in the slot require, thus causing us to have resources that are reserved for a slot, but not being used? Is it possible that such problems appear when we have tasks in applications with different latencies (or different parallelisms)? Or even when we are performing multiple aggregations (that cannot be optimised using folds or reduces) on the same window? 

Thanks in advance,

Luís Alves

Re: Resource overallocation

Posted by Luis Alves <lm...@gmail.com>.
Hi Till,

Thanks for your answer. It did helped me validating some of my thoughts.

Cheers,

Luís

On Sat, 29 Apr 2017 at 09:26 Till Rohrmann

<
mailto:Till Rohrmann <tr...@apache.org>
> wrote:

a, pre, code, a:link, body { word-wrap: break-word !important; }

Hi Luis,

let me try to answer some of your questions:

Usually we recommend to reserve for each slot at least one CPU core. One

reason why you would want to reserve more slots than cores is that you

execute blocking operations in your operators. That way you can keep all of

your cores busy.

If you observe that your application cannot keep up with the incoming data

rate, then it is usually best to increase the parallelism (given that the

bottleneck is not an operator with parallelism 1 and that your data has

enough key values).

If you have multiple compute intensive operators in one pipeline (maybe

even chained) and you have fewer cores than these operators per slot, then

it might make sense to split up the pipeline. That way the computation of

these operators can be better done concurrently.

Theoretically, it can be the case that you assign more resources to a slot

than are actually needed. E.g. you have a single operator in each slot but

multiple cores assigned to it. Also in case of different parallelism of

operators, some slots might get more sub-tasks assigned than others. One

thing you can always do is to monitor the execution of your job to detect

under and over-provisioning.

I hope this helps to answer some of your questions.

Cheers,

Till

On Sat, Apr 22, 2017 at 7:42 PM, Luis Alves lmtjalves@gmail.com> wrote:

> Hi,

>

> Regarding the features on Flink that allow to optimize resource usage in

> the cluster (+ latency, throughput ...), i.e. slot sharing, task chaining,

> async i/o and dynamic scaling, I would like to ask the following questions

> (all in the stream processing context):

>

> In which cases would someone be interested in having the number of slots

> in a task manager higher than the number of cpu cores?

>

> In which case should we prefer split a pipeline of tasks over multiple

> slots (disable slot sharing), instead of increasing the parallelism, in

> order for an application to keep up with the incoming data rates?

>

> Is it possible that even when using all the features above, the resources

> reserved for a slot may be higher than the amount of resources that all the

> tasks in the slot require, thus causing us to have resources that are

> reserved for a slot, but not being used? Is it possible that such problems

> appear when we have tasks in applications with different latencies (or

> different parallelisms)? Or even when we are performing multiple

> aggregations (that cannot be optimised using folds or reduces) on the same

> window?

>

> Thanks in advance,

>

> Luís Alves

Re: Resource overallocation

Posted by Till Rohrmann <tr...@apache.org>.
Hi Luis,

let me try to answer some of your questions:

Usually we recommend to reserve for each slot at least one CPU core. One
reason why you would want to reserve more slots than cores is that you
execute blocking operations in your operators. That way you can keep all of
your cores busy.

If you observe that your application cannot keep up with the incoming data
rate, then it is usually best to increase the parallelism (given that the
bottleneck is not an operator with parallelism 1 and that your data has
enough key values).

If you have multiple compute intensive operators in one pipeline (maybe
even chained) and you have fewer cores than these operators per slot, then
it might make sense to split up the pipeline. That way the computation of
these operators can be better done concurrently.

Theoretically, it can be the case that you assign more resources to a slot
than are actually needed. E.g. you have a single operator in each slot but
multiple cores assigned to it. Also in case of different parallelism of
operators, some slots might get more sub-tasks assigned than others. One
thing you can always do is to monitor the execution of your job to detect
under and over-provisioning.

I hope this helps to answer some of your questions.

Cheers,
Till

On Sat, Apr 22, 2017 at 7:42 PM, Luis Alves <lm...@gmail.com> wrote:

> Hi,
>
> Regarding the features on Flink that allow to optimize resource usage in
> the cluster (+ latency, throughput ...), i.e. slot sharing, task chaining,
> async i/o and dynamic scaling, I would like to ask the following questions
> (all in the stream processing context):
>
> In which cases would someone be interested in having the number of slots
> in a task manager higher than the number of cpu cores?
>
> In which case should we prefer split a pipeline of tasks over multiple
> slots (disable slot sharing), instead of increasing the parallelism, in
> order for an application to keep up with the incoming data rates?
>
> Is it possible that even when using all the features above, the resources
> reserved for a slot may be higher than the amount of resources that all the
> tasks in the slot require, thus causing us to have resources that are
> reserved for a slot, but not being used? Is it possible that such problems
> appear when we have tasks in applications with different latencies (or
> different parallelisms)? Or even when we are performing multiple
> aggregations (that cannot be optimised using folds or reduces) on the same
> window?
>
> Thanks in advance,
>
> Luís Alves