You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Tamara Mendt <ta...@gmail.com> on 2015/08/25 11:15:19 UTC

Broadcasting sets in Flink Streaming

Hello,

I have been trying to use the function withBroadcastSet on a
SingleOutputStreamOperator (map) the same way I would on a MapOperator for
a DataSet. From what I see, this cannot be done. I wonder if there is some
way to broadcast a DataSet to the tasks that are performing transformations
on a DataStream?

I am basically pre-calculating some things with Flink which I later need
for the transformations on the incoming data from the stream. So I want to
broadcast the resulting datasets from the pre-calculations.

Any ideas on how to best approach this?

Thanks, cheers

Tamara.

Re: Broadcasting sets in Flink Streaming

Posted by Tamara Mendt <ta...@gmail.com>.
Ok, I'll try that. Thanks a lot!

On Tue, Aug 25, 2015 at 4:19 PM, Stephan Ewen <se...@apache.org> wrote:

> You can do something very similar like broadcast sets like this:
>
> Use a Co-Map function and connect your main data set regularly ("forward"
> partitioning) to one input and your broadcast set via "broadcast" to the
> other input. You can then retrieve the data in the two map functions
> separately.
>
> This approach misses the logic that the broadcast data arrives fully
> before the non-broadcast data (you may receive events from the main data
> set before all broadcast data was received), but maybe you can work around
> that...
>
> On Tue, Aug 25, 2015 at 2:45 PM, Till Rohrmann <tr...@apache.org>
> wrote:
>
>> Hi Tamara,
>>
>> I think this is not officially supported by Flink yet. However, I think
>> that Gyula had once an example where he did something comparable. Maybe he
>> can chime in here.
>>
>> Cheers,
>> Till
>>
>> On Tue, Aug 25, 2015 at 11:15 AM, Tamara Mendt <ta...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I have been trying to use the function withBroadcastSet on a
>>> SingleOutputStreamOperator (map) the same way I would on a MapOperator for
>>> a DataSet. From what I see, this cannot be done. I wonder if there is some
>>> way to broadcast a DataSet to the tasks that are performing transformations
>>> on a DataStream?
>>>
>>> I am basically pre-calculating some things with Flink which I later need
>>> for the transformations on the incoming data from the stream. So I want to
>>> broadcast the resulting datasets from the pre-calculations.
>>>
>>> Any ideas on how to best approach this?
>>>
>>> Thanks, cheers
>>>
>>> Tamara.
>>>
>>
>>
>


-- 
Tamara Mendt

Re: Broadcasting sets in Flink Streaming

Posted by Stephan Ewen <se...@apache.org>.
You can do something very similar like broadcast sets like this:

Use a Co-Map function and connect your main data set regularly ("forward"
partitioning) to one input and your broadcast set via "broadcast" to the
other input. You can then retrieve the data in the two map functions
separately.

This approach misses the logic that the broadcast data arrives fully before
the non-broadcast data (you may receive events from the main data set
before all broadcast data was received), but maybe you can work around
that...

On Tue, Aug 25, 2015 at 2:45 PM, Till Rohrmann <tr...@apache.org> wrote:

> Hi Tamara,
>
> I think this is not officially supported by Flink yet. However, I think
> that Gyula had once an example where he did something comparable. Maybe he
> can chime in here.
>
> Cheers,
> Till
>
> On Tue, Aug 25, 2015 at 11:15 AM, Tamara Mendt <ta...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I have been trying to use the function withBroadcastSet on a
>> SingleOutputStreamOperator (map) the same way I would on a MapOperator for
>> a DataSet. From what I see, this cannot be done. I wonder if there is some
>> way to broadcast a DataSet to the tasks that are performing transformations
>> on a DataStream?
>>
>> I am basically pre-calculating some things with Flink which I later need
>> for the transformations on the incoming data from the stream. So I want to
>> broadcast the resulting datasets from the pre-calculations.
>>
>> Any ideas on how to best approach this?
>>
>> Thanks, cheers
>>
>> Tamara.
>>
>
>

Re: Broadcasting sets in Flink Streaming

Posted by Till Rohrmann <tr...@apache.org>.
Hi Tamara,

I think this is not officially supported by Flink yet. However, I think
that Gyula had once an example where he did something comparable. Maybe he
can chime in here.

Cheers,
Till

On Tue, Aug 25, 2015 at 11:15 AM, Tamara Mendt <ta...@gmail.com> wrote:

> Hello,
>
> I have been trying to use the function withBroadcastSet on a
> SingleOutputStreamOperator (map) the same way I would on a MapOperator for
> a DataSet. From what I see, this cannot be done. I wonder if there is some
> way to broadcast a DataSet to the tasks that are performing transformations
> on a DataStream?
>
> I am basically pre-calculating some things with Flink which I later need
> for the transformations on the incoming data from the stream. So I want to
> broadcast the resulting datasets from the pre-calculations.
>
> Any ideas on how to best approach this?
>
> Thanks, cheers
>
> Tamara.
>