You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Richard Deurwaarder <ri...@xeli.eu> on 2019/02/25 10:45:08 UTC

Share broadcast state between multiple operators

Hi All,

Due to the way our code is structured, we would like to use the broadcast
state at multiple points of our pipeline. So not only share it between
multiple instances of the same operator but also between multiple
operators. See the image below for a simplified example.

Flink does not seem to have any problems with this at runtime but I wonder:

   - Is this a good pattern and was it designed with something like this in
   mind?
   - If we use the same MapStateDescriptor in both operators, does the
   state only get stored once? And does it also only get written once?


[image: broadcast-state.png]

Thanks!

Re: Share broadcast state between multiple operators

Posted by Till Rohrmann <tr...@apache.org>.
On Tue, Feb 26, 2019 at 3:10 PM Richard Deurwaarder <ri...@xeli.eu> wrote:

> Hello Till,
>
> So if I understand correctly, when messages get broadcast to multiple
> operators, each operator will execute the processBroadcast() function and
> store the state under a sort of operator scope? Even if they use the same
> MapStateDescriptor?
>
Yes.


> And if it replicates the state between operators is what makes the
> broadcast state different from an Operator state with Union redistribution?
>
The union redistribution is relevant in case of a restart where the every
operator receives the state from all other operators. The individual
operator states can be different. In case of broadcast state every
operator's state will be the same. So there is no union redistribution
needed.

Cheers,
Till


>
> Thanks for any clarification, very interesting to learn about :)
>
> Richard
>
> On Tue, Feb 26, 2019 at 11:57 AM Till Rohrmann <tr...@apache.org>
> wrote:
>
>> Hi Richard,
>>
>> Flink does not support to share state between multiple operators.
>> Technically also the broadcast state is not shared but replicated between
>> subtasks belonging to the same operator. So what you can do is to send the
>> broadcast input to different operators, but they will all keep their own
>> copy of the state.
>>
>> Cheers,
>> Till
>>
>> On Mon, Feb 25, 2019 at 11:45 AM Richard Deurwaarder <ri...@xeli.eu>
>> wrote:
>>
>>> Hi All,
>>>
>>> Due to the way our code is structured, we would like to use the
>>> broadcast state at multiple points of our pipeline. So not only share it
>>> between multiple instances of the same operator but also between multiple
>>> operators. See the image below for a simplified example.
>>>
>>> Flink does not seem to have any problems with this at runtime but I
>>> wonder:
>>>
>>>    - Is this a good pattern and was it designed with something like
>>>    this in mind?
>>>    - If we use the same MapStateDescriptor in both operators, does the
>>>    state only get stored once? And does it also only get written once?
>>>
>>>
>>> [image: broadcast-state.png]
>>>
>>> Thanks!
>>>
>>

Re: Share broadcast state between multiple operators

Posted by Richard Deurwaarder <ri...@xeli.eu>.
Hello Till,

So if I understand correctly, when messages get broadcast to multiple
operators, each operator will execute the processBroadcast() function and
store the state under a sort of operator scope? Even if they use the same
MapStateDescriptor?

And if it replicates the state between operators is what makes the
broadcast state different from an Operator state with Union redistribution?

Thanks for any clarification, very interesting to learn about :)

Richard

On Tue, Feb 26, 2019 at 11:57 AM Till Rohrmann <tr...@apache.org> wrote:

> Hi Richard,
>
> Flink does not support to share state between multiple operators.
> Technically also the broadcast state is not shared but replicated between
> subtasks belonging to the same operator. So what you can do is to send the
> broadcast input to different operators, but they will all keep their own
> copy of the state.
>
> Cheers,
> Till
>
> On Mon, Feb 25, 2019 at 11:45 AM Richard Deurwaarder <ri...@xeli.eu>
> wrote:
>
>> Hi All,
>>
>> Due to the way our code is structured, we would like to use the broadcast
>> state at multiple points of our pipeline. So not only share it between
>> multiple instances of the same operator but also between multiple
>> operators. See the image below for a simplified example.
>>
>> Flink does not seem to have any problems with this at runtime but I
>> wonder:
>>
>>    - Is this a good pattern and was it designed with something like this
>>    in mind?
>>    - If we use the same MapStateDescriptor in both operators, does the
>>    state only get stored once? And does it also only get written once?
>>
>>
>> [image: broadcast-state.png]
>>
>> Thanks!
>>
>

Re: Share broadcast state between multiple operators

Posted by Till Rohrmann <tr...@apache.org>.
Hi Richard,

Flink does not support to share state between multiple operators.
Technically also the broadcast state is not shared but replicated between
subtasks belonging to the same operator. So what you can do is to send the
broadcast input to different operators, but they will all keep their own
copy of the state.

Cheers,
Till

On Mon, Feb 25, 2019 at 11:45 AM Richard Deurwaarder <ri...@xeli.eu>
wrote:

> Hi All,
>
> Due to the way our code is structured, we would like to use the broadcast
> state at multiple points of our pipeline. So not only share it between
> multiple instances of the same operator but also between multiple
> operators. See the image below for a simplified example.
>
> Flink does not seem to have any problems with this at runtime but I wonder:
>
>    - Is this a good pattern and was it designed with something like this
>    in mind?
>    - If we use the same MapStateDescriptor in both operators, does the
>    state only get stored once? And does it also only get written once?
>
>
> [image: broadcast-state.png]
>
> Thanks!
>