You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Marius Melzer <ma...@rasumi.net> on 2016/06/24 16:52:52 UTC

Understanding Operator Chaining

Hi,

I'm currently familiarizing myself with the Flink Code and I have a
question regarding operator chaining, maybe somebody could explain this
to me?

The default Chaining Strategy is HEAD (meaning that the predecessor of
the operator will not be chained, thus this operator may be the start of
a chain but never in the middle or the end). This is overwritten by
almost all operators that I looked at with ALWAYS, allowing the operator
to be chained without limits. A counter example is StreamGroupedFold,
which uses the default HEAD. My question is: Why does the
StreamGroupedFold "interrupt" chaining and what is the general rule
which operators may be chained limitless and which aren't. Also: Will
all operators in a row that have ALWAYS chaining always end up in the
same Chain/Task or may they be splitted when the Chain gets too
long/complex?

Thanks,
Marius

Re: Understanding Operator Chaining

Posted by Maximilian Michels <mx...@apache.org>.
Hi Marius,

The current chaining code assumes there will never be too complex
chains. Which is probably true for most jobs but could become in issue
in some cases.

If you want to make sure the chain is broken, you can start a new
chain using `startNewChain()` on all single output operators.

Cheers,
Max

On Mon, Jun 27, 2016 at 11:19 AM, Aljoscha Krettek <al...@apache.org> wrote:
> Hi Marius,
> the chaining code is still somewhat fragile and some stuff in there are
> leftovers. For example, StreamGroupedFold can only be used on a
> KeyedStream, which means that it can never be within a chain because the
> shuffle always breaks a chain. Specifying HEAD here is therefore redundant.
> I think by now all operators should have ALWAYS as the chaining strategy
> and we might even be able to get rid of the field in operators.
>
> The decision whether to chain or not is done
> in StreamingJobGraphGenerator.isChainable(). Chains are also never broken
> if they become to long/complex.
>
> Cheers,
> Aljoscha
>
> On Fri, 24 Jun 2016 at 18:53 Marius Melzer <ma...@rasumi.net> wrote:
>
>> Hi,
>>
>> I'm currently familiarizing myself with the Flink Code and I have a
>> question regarding operator chaining, maybe somebody could explain this
>> to me?
>>
>> The default Chaining Strategy is HEAD (meaning that the predecessor of
>> the operator will not be chained, thus this operator may be the start of
>> a chain but never in the middle or the end). This is overwritten by
>> almost all operators that I looked at with ALWAYS, allowing the operator
>> to be chained without limits. A counter example is StreamGroupedFold,
>> which uses the default HEAD. My question is: Why does the
>> StreamGroupedFold "interrupt" chaining and what is the general rule
>> which operators may be chained limitless and which aren't. Also: Will
>> all operators in a row that have ALWAYS chaining always end up in the
>> same Chain/Task or may they be splitted when the Chain gets too
>> long/complex?
>>
>> Thanks,
>> Marius
>>

Re: Understanding Operator Chaining

Posted by Aljoscha Krettek <al...@apache.org>.
Hi Marius,
the chaining code is still somewhat fragile and some stuff in there are
leftovers. For example, StreamGroupedFold can only be used on a
KeyedStream, which means that it can never be within a chain because the
shuffle always breaks a chain. Specifying HEAD here is therefore redundant.
I think by now all operators should have ALWAYS as the chaining strategy
and we might even be able to get rid of the field in operators.

The decision whether to chain or not is done
in StreamingJobGraphGenerator.isChainable(). Chains are also never broken
if they become to long/complex.

Cheers,
Aljoscha

On Fri, 24 Jun 2016 at 18:53 Marius Melzer <ma...@rasumi.net> wrote:

> Hi,
>
> I'm currently familiarizing myself with the Flink Code and I have a
> question regarding operator chaining, maybe somebody could explain this
> to me?
>
> The default Chaining Strategy is HEAD (meaning that the predecessor of
> the operator will not be chained, thus this operator may be the start of
> a chain but never in the middle or the end). This is overwritten by
> almost all operators that I looked at with ALWAYS, allowing the operator
> to be chained without limits. A counter example is StreamGroupedFold,
> which uses the default HEAD. My question is: Why does the
> StreamGroupedFold "interrupt" chaining and what is the general rule
> which operators may be chained limitless and which aren't. Also: Will
> all operators in a row that have ALWAYS chaining always end up in the
> same Chain/Task or may they be splitted when the Chain gets too
> long/complex?
>
> Thanks,
> Marius
>