You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Michele Bertoni <mi...@mail.polimi.it> on 2015/07/21 07:27:37 UTC

sorted cogroup

Hi everybody, i need to execute a cogroup on sorted groups.
I explain it better: I have two datasets i.e. (key, value), I want to cogroup on key and then the have both iterator sorted by value
how can i get it?
I know iterator should be collected to be sorted but i want to avoid it. what happens if i partition datasets separately by key, then sort partition and finally cogroup by key? can I assume they keep the order on key?

which is the drawback in doing this?
I expect to have two data shuffling one partition and one for cogroup


thanks

Best
michele

Re: sorted cogroup

Posted by Till Rohrmann <tr...@apache.org>.

Hi Michele,

Flink supports coGroups on sorted inputs. If you have a ds1 = DataSet[(Key,
Value1)] and ds2 = DataSet[(Key, Value2)] you obtain a sorted coGroup for
example by:

ds1.coGroup(ds2).where(0).equalsTo(0).sortFirstGroup(1,
Order.ASCENDING).sortSecondGroup(1, Order.DESCENDING)

Cheers,
Till


On Tue, Jul 21, 2015 at 7:27 AM, Michele Bertoni <
michele1.bertoni@mail.polimi.it> wrote:

> Hi everybody, i need to execute a cogroup on sorted groups.
> I explain it better: I have two datasets i.e. (key, value), I want to
> cogroup on key and then the have both iterator sorted by value
> how can i get it?
> I know iterator should be collected to be sorted but i want to avoid it.
> what happens if i partition datasets separately by key, then sort partition
> and finally cogroup by key? can I assume they keep the order on key?
>
> which is the drawback in doing this?
> I expect to have two data shuffling one partition and one for cogroup
>
>
> thanks
>
> Best
> michele