You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ameet Kini <am...@gmail.com> on 2014/03/20 20:20:22 UTC

sort order after reduceByKey / groupByKey

val rdd2 = rdd.partitionBy(my partitioner).reduceByKey(some function)

I see that rdd2's partitions are not internally sorted. Can someone confirm
that this is expected behavior? And if so, the only way to get partitions
internally sorted is to follow it with something like this

val rdd2 = rdd.partitionBy(my partitioner).reduceByKey(some
function).mapPartitions(p => sort(p))

Thanks,
Ameet

Re: sort order after reduceByKey / groupByKey

Posted by Ameet Kini <am...@gmail.com>.
I saw that but I don't need a global sort, only intra-partition sort.

Ameet


On Thu, Mar 20, 2014 at 3:26 PM, Mayur Rustagi <ma...@gmail.com>wrote:

> Thats expected. I think sortByKey is option too & probably a better one.
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
>  @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
> On Thu, Mar 20, 2014 at 3:20 PM, Ameet Kini <am...@gmail.com> wrote:
>
>>
>> val rdd2 = rdd.partitionBy(my partitioner).reduceByKey(some function)
>>
>> I see that rdd2's partitions are not internally sorted. Can someone
>> confirm that this is expected behavior? And if so, the only way to get
>> partitions internally sorted is to follow it with something like this
>>
>> val rdd2 = rdd.partitionBy(my partitioner).reduceByKey(some
>> function).mapPartitions(p => sort(p))
>>
>> Thanks,
>> Ameet
>>
>>
>

Re: sort order after reduceByKey / groupByKey

Posted by Mayur Rustagi <ma...@gmail.com>.
Thats expected. I think sortByKey is option too & probably a better one.

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Thu, Mar 20, 2014 at 3:20 PM, Ameet Kini <am...@gmail.com> wrote:

>
> val rdd2 = rdd.partitionBy(my partitioner).reduceByKey(some function)
>
> I see that rdd2's partitions are not internally sorted. Can someone
> confirm that this is expected behavior? And if so, the only way to get
> partitions internally sorted is to follow it with something like this
>
> val rdd2 = rdd.partitionBy(my partitioner).reduceByKey(some
> function).mapPartitions(p => sort(p))
>
> Thanks,
> Ameet
>
>