You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Shivaram Venkataraman (JIRA)" <ji...@apache.org> on 2015/06/04 09:15:38 UTC

[jira] [Commented] (SPARK-2774) Set preferred locations for reduce tasks

    [ https://issues.apache.org/jira/browse/SPARK-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572302#comment-14572302 ] 

Shivaram Venkataraman commented on SPARK-2774:
----------------------------------------------

[~rxin] [~joshrosen] is there still interest for this change ? I have see ~2x wins for some of the ML / matrix workloads that [~kayousterhout] also benchmarked as network bound.

> Set preferred locations for reduce tasks
> ----------------------------------------
>
>                 Key: SPARK-2774
>                 URL: https://issues.apache.org/jira/browse/SPARK-2774
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Shivaram Venkataraman
>            Assignee: Shivaram Venkataraman
>
> Currently we do not set preferred locations for reduce tasks in Spark. This patch proposes setting preferred locations based on the map output sizes and locations tracked by the MapOutputTracker. This is useful in two conditions
> 1. When you have a small job in a large cluster it can be useful to co-locate map and reduce tasks to avoid going over the network
> 2. If there is a lot of data skew in the map stage outputs, then it is beneficial to place the reducer close to the largest output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org