You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Jiewen Shao <fi...@gmail.com> on 2017/11/14 23:13:01 UTC

[Flink] merge-sort for a DataStream

In Flink, I have DataStream<List<comparable_pojo>>, each list is
individually pre-sorted, what I need to do is persist everything in one
shot with global sort order. any ides the best to do this? Hope it makes
sense.

Thanks in advance!

Re: [Flink] merge-sort for a DataStream

Posted by Kien Truong <du...@gmail.com>.
Hi Jiewen,

Since a DataStream can have infinite number of elements, you can't 
globally sorted all the elements.

If the number of element is finite, you can use the DataSet API, which 
will look smth like this


    DataSet<List<comparable_pojo>> a;

    DataSet<comparable_pojo> aFlatten = a.flatMap(..);

    DataSet<comparable_pojo> aSorted =
    aFlatten.partitionByRange(...).sortPartition(...);


Best regards.

Kien

On 11/15/2017 6:13 AM, Jiewen Shao wrote:
> In Flink, I have DataStream<List<comparable_pojo>>, each list is 
> individually pre-sorted, what I need to do is persist everything in 
> one shot with global sort order. any ides the best to do this? Hope it 
> makes sense.
>
> Thanks in advance!