You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Vijayendra Yadav <co...@gmail.com> on 2020/09/02 00:54:00 UTC

Task Chaining slots performance

Hi Team,

https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/#task-chaining-and-resource-groups



*Flink chaining my Tasks which is like: stream.map().filter().map() *

*I think here the entire chain runs in the same slot.*

*Documentation says flink does chahining for better performance, but are
there any scenarios we should disable or start a new chain mainly for the
purpose of better performance ?*

StreamExecutionEnvironment.disableOperatorChaining()

someStream.filter(...).map(...).startNewChain().map(...)

someStream.map(...).disableChaining()

Re: Task Chaining slots performance

Posted by Vijayendra Yadav <co...@gmail.com>.
Thanks for the information Till

Regards,
Vijay

> 
> On Sep 2, 2020, at 2:21 AM, Till Rohrmann <tr...@apache.org> wrote:
> 
> 
> Hi Vijayendra,
> 
> in the general case, I believe that chaining will almost always give you better performance since you consume fewer resources, avoid context switches between threads and if object reuse is enabled even avoid serialization when records are passed from one operator to another.
> 
> The only scenario I can think of where disabling chaining might be beneficial, is when you have a pipeline of operator where each operator performs a blocking operation (e.g. interacts with some external systems). If these operators are chained, then the processing time of a single record would be n * time of blocking operation. If you disabled chaining in this scenario, then these waiting times could overlap between different records (the first operator can already start processing the next record, while the second operator waits for the external operation to finish for the first record). That way, once the pipeline is fully filled, the processing time of a single record would be time of the longest blocking operation.
> 
> Cheers,
> Till
> 
>> On Wed, Sep 2, 2020 at 2:54 AM Vijayendra Yadav <co...@gmail.com> wrote:
>> Hi Team,
>> 
>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/#task-chaining-and-resource-groups 
>> 
>> 
>> Flink chaining my Tasks which is like: stream.map().filter().map() 
>> 
>> I think here the entire chain runs in the same slot.
>> 
>> Documentation says flink does chahining for better performance, but are there any scenarios we should disable or start a new chain mainly for the purpose of better performance ?
>> 
>> StreamExecutionEnvironment.disableOperatorChaining()
>> 
>> someStream.filter(...).map(...).startNewChain().map(...)
>> someStream.map(...).disableChaining()

Re: Task Chaining slots performance

Posted by Till Rohrmann <tr...@apache.org>.
Hi Vijayendra,

in the general case, I believe that chaining will almost always give you
better performance since you consume fewer resources, avoid context
switches between threads and if object reuse is enabled even avoid
serialization when records are passed from one operator to another.

The only scenario I can think of where disabling chaining might be
beneficial, is when you have a pipeline of operator where each operator
performs a blocking operation (e.g. interacts with some external systems).
If these operators are chained, then the processing time of a single record
would be n * time of blocking operation. If you disabled chaining in this
scenario, then these waiting times could overlap between different records
(the first operator can already start processing the next record, while the
second operator waits for the external operation to finish for the first
record). That way, once the pipeline is fully filled, the processing time
of a single record would be time of the longest blocking operation.

Cheers,
Till

On Wed, Sep 2, 2020 at 2:54 AM Vijayendra Yadav <co...@gmail.com>
wrote:

> Hi Team,
>
>
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/#task-chaining-and-resource-groups
>
>
>
> *Flink chaining my Tasks which is like: stream.map().filter().map() *
>
> *I think here the entire chain runs in the same slot.*
>
> *Documentation says flink does chahining for better performance, but are
> there any scenarios we should disable or start a new chain mainly for the
> purpose of better performance ?*
>
> StreamExecutionEnvironment.disableOperatorChaining()
>
> someStream.filter(...).map(...).startNewChain().map(...)
>
> someStream.map(...).disableChaining()
>
>