You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by subash basnet <ya...@gmail.com> on 2016/05/03 12:35:59 UTC

How to perform Broadcast and groupBy in DataStream like DataSet

Hello all,

How could we perform *withBroadcastSet* and *groupBy* in DataStream like
that of DataSet in the below KMeans code:

DataSet<Centroid> newCentroids = points
.map(new SelectNearestCenter()).*withBroadcastSet*(loop, "centroids")
.map(new CountAppender()).*groupBy*(0).reduce(new CentroidAccumulator())
.map(new CentroidAverager());


DataStream<Centroid> newCentroids = points.map(new
SelectNearestCenter()).???


Best Regards,
Subash Basnet

Re: How to perform Broadcast and groupBy in DataStream like DataSet

Posted by subash basnet <ya...@gmail.com>.
Hello Stefano,

Thank you, I found out that just sometime ago that I could use keyBy, but I
couldn't find how to set and getBroadcastVariable in datastream like that
of dataset.
For example in below code we get collection of *centroids* via broadcast.

Eg: In KMeans.java
class X extends MapFunctions<>{
  private Collection<Centroid> *centroids*;
  public void open(Configuration parameters) throws Exception {
this.*centroids* = getRuntimeContext().getBroadcastVariable("centroids");
  }
  for (Centroid cent : *centroids*) {
  }
}


Best Regards,
Subash Basnet

On Tue, May 3, 2016 at 4:04 PM, Stefano Baghino <
stefano.baghino@radicalbit.io> wrote:

> I'm not sure in regards of "withBroadcastSet", but in the DataStream you
> "keyBy" instead of "groupBy".
>
> On Tue, May 3, 2016 at 12:35 PM, subash basnet <ya...@gmail.com> wrote:
>
>> Hello all,
>>
>> How could we perform *withBroadcastSet* and *groupBy* in DataStream like
>> that of DataSet in the below KMeans code:
>>
>> DataSet<Centroid> newCentroids = points
>> .map(new SelectNearestCenter()).*withBroadcastSet*(loop, "centroids")
>> .map(new CountAppender()).*groupBy*(0).reduce(new CentroidAccumulator())
>> .map(new CentroidAverager());
>>
>>
>> DataStream<Centroid> newCentroids = points.map(new
>> SelectNearestCenter()).???
>>
>>
>> Best Regards,
>> Subash Basnet
>>
>
>
>
> --
> BR,
> Stefano Baghino
>
> Software Engineer @ Radicalbit
>

Re: How to perform Broadcast and groupBy in DataStream like DataSet

Posted by Stefano Baghino <st...@radicalbit.io>.
I'm not sure in regards of "withBroadcastSet", but in the DataStream you
"keyBy" instead of "groupBy".

On Tue, May 3, 2016 at 12:35 PM, subash basnet <ya...@gmail.com> wrote:

> Hello all,
>
> How could we perform *withBroadcastSet* and *groupBy* in DataStream like
> that of DataSet in the below KMeans code:
>
> DataSet<Centroid> newCentroids = points
> .map(new SelectNearestCenter()).*withBroadcastSet*(loop, "centroids")
> .map(new CountAppender()).*groupBy*(0).reduce(new CentroidAccumulator())
> .map(new CentroidAverager());
>
>
> DataStream<Centroid> newCentroids = points.map(new
> SelectNearestCenter()).???
>
>
> Best Regards,
> Subash Basnet
>



-- 
BR,
Stefano Baghino

Software Engineer @ Radicalbit