You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:34:09 UTC

[jira] [Resolved] (SPARK-8137) Improve treeAggregate to combine all data on one machine first

     [ https://issues.apache.org/jira/browse/SPARK-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-8137.
---------------------------------
    Resolution: Incomplete

> Improve treeAggregate to combine all data on one machine first
> --------------------------------------------------------------
>
>                 Key: SPARK-8137
>                 URL: https://issues.apache.org/jira/browse/SPARK-8137
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.4.0
>            Reporter: Shivaram Venkataraman
>            Priority: Major
>              Labels: bulk-closed
>
> Right now if we have multiple partitions on the same machine we shuffle the partitions and don't aggregate them first in treeAggregate. Once we have support for shuffle locality, we can get this for free by using the executorIds as the keys for aggregation. https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/util/Utils.scala#L96 has an example implementation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org