You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Matei Zaharia (JIRA)" <ji...@apache.org> on 2014/10/21 03:55:36 UTC

[jira] [Commented] (SPARK-3655) Secondary sort

    [ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177824#comment-14177824 ] 

Matei Zaharia commented on SPARK-3655:
--------------------------------------

I believe you can build this on top of sortByKey with mapPartitions. The values for each key are guaranteed to go to the same node (though we should document that). Or are you looking to partition the keys by one function and have the values sorted by another? In that case we added this weird repartitionAndSortWithinPartitions function to OrderedRDDFunctions that would do the trick (it was added to make it easier to port apps from MapReduce).

> Secondary sort
> --------------
>
>                 Key: SPARK-3655
>                 URL: https://issues.apache.org/jira/browse/SPARK-3655
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: koert kuipers
>            Priority: Minor
>
> Now that spark has a sort based shuffle, can we expect a secondary sort soon? There are some use cases where getting a sorted iterator of values per key is helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org