You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Martin Neumann <mn...@spotify.com> on 2014/10/06 11:57:03 UTC

load broadcast set into searchable set

Hej,

I have a Flink job with with a filter step. I now have a list of exceptions
where I need to do some extra work (300k data). I thought I just use a
boradcast set and then for each like compare if its in the exception set.

What is the best way to implement this in Flink? Is there an efficient way
of checking if a certain element is in the Broadcastset? Or can I somehow
dump the Broadcastset into a set?

cheers Martin

Re: load broadcast set into searchable set

Posted by Stephan Ewen <se...@apache.org>.
You actually can check directly in the broadcast set. But since it is a
list, searching in is slower than in a hash set. That is all...

On Mon, Oct 6, 2014 at 1:09 PM, Martin Neumann <mn...@spotify.com> wrote:

> Yes, thanks.
>
> I was wondering if I can check directly in the broadcast set but since I
> have to get it local anyway it should be not to much overhead.
>
> cheers Martin
>
> On Mon, Oct 6, 2014 at 12:34 PM, Stephan Ewen <se...@apache.org> wrote:
>
> > Hej!
> >
> > Yes, the "getRuntimeEnvironment().getBroadcastVariable() returns a list,
> > which you can add to set:
> >
> >
> > // in the function:
> >
> > private Set<T> specials;
> >
> > public void open(Configuration conf) {
> >     List<T> bc =
> > getRuntimeContect().getBroadcastVariable("the-bc-var-name");
> >    specials = new HashSet<T>(bc);
> > }
> >
> >
> >
> > Is that what you had in mind?
> >
> > Stephan
> >
> >
> > On Mon, Oct 6, 2014 at 11:57 AM, Martin Neumann <mn...@spotify.com>
> > wrote:
> >
> > > Hej,
> > >
> > > I have a Flink job with with a filter step. I now have a list of
> > exceptions
> > > where I need to do some extra work (300k data). I thought I just use a
> > > boradcast set and then for each like compare if its in the exception
> set.
> > >
> > > What is the best way to implement this in Flink? Is there an efficient
> > way
> > > of checking if a certain element is in the Broadcastset? Or can I
> somehow
> > > dump the Broadcastset into a set?
> > >
> > > cheers Martin
> > >
> >
>

Re: load broadcast set into searchable set

Posted by Martin Neumann <mn...@spotify.com>.
Yes, thanks.

I was wondering if I can check directly in the broadcast set but since I
have to get it local anyway it should be not to much overhead.

cheers Martin

On Mon, Oct 6, 2014 at 12:34 PM, Stephan Ewen <se...@apache.org> wrote:

> Hej!
>
> Yes, the "getRuntimeEnvironment().getBroadcastVariable() returns a list,
> which you can add to set:
>
>
> // in the function:
>
> private Set<T> specials;
>
> public void open(Configuration conf) {
>     List<T> bc =
> getRuntimeContect().getBroadcastVariable("the-bc-var-name");
>    specials = new HashSet<T>(bc);
> }
>
>
>
> Is that what you had in mind?
>
> Stephan
>
>
> On Mon, Oct 6, 2014 at 11:57 AM, Martin Neumann <mn...@spotify.com>
> wrote:
>
> > Hej,
> >
> > I have a Flink job with with a filter step. I now have a list of
> exceptions
> > where I need to do some extra work (300k data). I thought I just use a
> > boradcast set and then for each like compare if its in the exception set.
> >
> > What is the best way to implement this in Flink? Is there an efficient
> way
> > of checking if a certain element is in the Broadcastset? Or can I somehow
> > dump the Broadcastset into a set?
> >
> > cheers Martin
> >
>

Re: load broadcast set into searchable set

Posted by Stephan Ewen <se...@apache.org>.
Hej!

Yes, the "getRuntimeEnvironment().getBroadcastVariable() returns a list,
which you can add to set:


// in the function:

private Set<T> specials;

public void open(Configuration conf) {
    List<T> bc =
getRuntimeContect().getBroadcastVariable("the-bc-var-name");
   specials = new HashSet<T>(bc);
}



Is that what you had in mind?

Stephan


On Mon, Oct 6, 2014 at 11:57 AM, Martin Neumann <mn...@spotify.com>
wrote:

> Hej,
>
> I have a Flink job with with a filter step. I now have a list of exceptions
> where I need to do some extra work (300k data). I thought I just use a
> boradcast set and then for each like compare if its in the exception set.
>
> What is the best way to implement this in Flink? Is there an efficient way
> of checking if a certain element is in the Broadcastset? Or can I somehow
> dump the Broadcastset into a set?
>
> cheers Martin
>