You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ameet Kini <am...@gmail.com> on 2014/03/20 20:20:22 UTC
sort order after reduceByKey / groupByKey
val rdd2 = rdd.partitionBy(my partitioner).reduceByKey(some function)
I see that rdd2's partitions are not internally sorted. Can someone confirm
that this is expected behavior? And if so, the only way to get partitions
internally sorted is to follow it with something like this
val rdd2 = rdd.partitionBy(my partitioner).reduceByKey(some
function).mapPartitions(p => sort(p))
Thanks,
Ameet
Re: sort order after reduceByKey / groupByKey
Posted by Ameet Kini <am...@gmail.com>.
I saw that but I don't need a global sort, only intra-partition sort.
Ameet
On Thu, Mar 20, 2014 at 3:26 PM, Mayur Rustagi <ma...@gmail.com>wrote:
> Thats expected. I think sortByKey is option too & probably a better one.
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
> On Thu, Mar 20, 2014 at 3:20 PM, Ameet Kini <am...@gmail.com> wrote:
>
>>
>> val rdd2 = rdd.partitionBy(my partitioner).reduceByKey(some function)
>>
>> I see that rdd2's partitions are not internally sorted. Can someone
>> confirm that this is expected behavior? And if so, the only way to get
>> partitions internally sorted is to follow it with something like this
>>
>> val rdd2 = rdd.partitionBy(my partitioner).reduceByKey(some
>> function).mapPartitions(p => sort(p))
>>
>> Thanks,
>> Ameet
>>
>>
>
Re: sort order after reduceByKey / groupByKey
Posted by Mayur Rustagi <ma...@gmail.com>.
Thats expected. I think sortByKey is option too & probably a better one.
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>
On Thu, Mar 20, 2014 at 3:20 PM, Ameet Kini <am...@gmail.com> wrote:
>
> val rdd2 = rdd.partitionBy(my partitioner).reduceByKey(some function)
>
> I see that rdd2's partitions are not internally sorted. Can someone
> confirm that this is expected behavior? And if so, the only way to get
> partitions internally sorted is to follow it with something like this
>
> val rdd2 = rdd.partitionBy(my partitioner).reduceByKey(some
> function).mapPartitions(p => sort(p))
>
> Thanks,
> Ameet
>
>