You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Vijayendra Yadav <co...@gmail.com> on 2020/09/02 00:54:00 UTC
Task Chaining slots performance
Hi Team,
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/#task-chaining-and-resource-groups
*Flink chaining my Tasks which is like: stream.map().filter().map() *
*I think here the entire chain runs in the same slot.*
*Documentation says flink does chahining for better performance, but are
there any scenarios we should disable or start a new chain mainly for the
purpose of better performance ?*
StreamExecutionEnvironment.disableOperatorChaining()
someStream.filter(...).map(...).startNewChain().map(...)
someStream.map(...).disableChaining()
Re: Task Chaining slots performance
Posted by Vijayendra Yadav <co...@gmail.com>.
Thanks for the information Till
Regards,
Vijay
>
> On Sep 2, 2020, at 2:21 AM, Till Rohrmann <tr...@apache.org> wrote:
>
>
> Hi Vijayendra,
>
> in the general case, I believe that chaining will almost always give you better performance since you consume fewer resources, avoid context switches between threads and if object reuse is enabled even avoid serialization when records are passed from one operator to another.
>
> The only scenario I can think of where disabling chaining might be beneficial, is when you have a pipeline of operator where each operator performs a blocking operation (e.g. interacts with some external systems). If these operators are chained, then the processing time of a single record would be n * time of blocking operation. If you disabled chaining in this scenario, then these waiting times could overlap between different records (the first operator can already start processing the next record, while the second operator waits for the external operation to finish for the first record). That way, once the pipeline is fully filled, the processing time of a single record would be time of the longest blocking operation.
>
> Cheers,
> Till
>
>> On Wed, Sep 2, 2020 at 2:54 AM Vijayendra Yadav <co...@gmail.com> wrote:
>> Hi Team,
>>
>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/#task-chaining-and-resource-groups
>>
>>
>> Flink chaining my Tasks which is like: stream.map().filter().map()
>>
>> I think here the entire chain runs in the same slot.
>>
>> Documentation says flink does chahining for better performance, but are there any scenarios we should disable or start a new chain mainly for the purpose of better performance ?
>>
>> StreamExecutionEnvironment.disableOperatorChaining()
>>
>> someStream.filter(...).map(...).startNewChain().map(...)
>> someStream.map(...).disableChaining()
Re: Task Chaining slots performance
Posted by Till Rohrmann <tr...@apache.org>.
Hi Vijayendra,
in the general case, I believe that chaining will almost always give you
better performance since you consume fewer resources, avoid context
switches between threads and if object reuse is enabled even avoid
serialization when records are passed from one operator to another.
The only scenario I can think of where disabling chaining might be
beneficial, is when you have a pipeline of operator where each operator
performs a blocking operation (e.g. interacts with some external systems).
If these operators are chained, then the processing time of a single record
would be n * time of blocking operation. If you disabled chaining in this
scenario, then these waiting times could overlap between different records
(the first operator can already start processing the next record, while the
second operator waits for the external operation to finish for the first
record). That way, once the pipeline is fully filled, the processing time
of a single record would be time of the longest blocking operation.
Cheers,
Till
On Wed, Sep 2, 2020 at 2:54 AM Vijayendra Yadav <co...@gmail.com>
wrote:
> Hi Team,
>
>
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/#task-chaining-and-resource-groups
>
>
>
> *Flink chaining my Tasks which is like: stream.map().filter().map() *
>
> *I think here the entire chain runs in the same slot.*
>
> *Documentation says flink does chahining for better performance, but are
> there any scenarios we should disable or start a new chain mainly for the
> purpose of better performance ?*
>
> StreamExecutionEnvironment.disableOperatorChaining()
>
> someStream.filter(...).map(...).startNewChain().map(...)
>
> someStream.map(...).disableChaining()
>
>