You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Reynold Xin (JIRA)" <ji...@apache.org> on 2014/09/03 02:47:51 UTC
[jira] [Updated] (SPARK-2978) Provide an MR-style shuffle
transformation
[ https://issues.apache.org/jira/browse/SPARK-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Reynold Xin updated SPARK-2978:
-------------------------------
Target Version/s: 1.2.0
> Provide an MR-style shuffle transformation
> ------------------------------------------
>
> Key: SPARK-2978
> URL: https://issues.apache.org/jira/browse/SPARK-2978
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core
> Reporter: Sandy Ryza
>
> For Hive on Spark joins in particular, and for running legacy MR code in general, I think it would be useful to provide a transformation with the semantics of the Hadoop MR shuffle, i.e. one that
> * groups by key: provides (Key, Iterator[Value])
> * within each partition, provides keys in sorted order
> A couple ways that could make sense to expose this:
> * Add a new operator. "groupAndSortByKey", "groupByKeyAndSortWithinPartition", "hadoopStyleShuffle", maybe?
> * Allow groupByKey to take an ordering param for keys within a partition
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org