You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Patrick Wendell (JIRA)" <ji...@apache.org> on 2014/06/11 02:06:02 UTC

[jira] [Updated] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored

     [ https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Wendell updated SPARK-2089:
-----------------------------------

    Target Version/s: 1.0.0, 1.0.1

> With YARN, preferredNodeLocalityData isn't honored 
> ---------------------------------------------------
>
>                 Key: SPARK-2089
>                 URL: https://issues.apache.org/jira/browse/SPARK-2089
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.0.0
>            Reporter: Sandy Ryza
>
> When running in YARN cluster mode, apps can pass preferred locality data when constructing a Spark context that will dictate where to request executor containers.
> This is currently broken because of a race condition.  The Spark-YARN code runs the user class and waits for it to start up a SparkContext.  During its initialization, the SparkContext will create a YarnClusterScheduler, which notifies a monitor in the Spark-YARN code that .  The Spark-Yarn code then immediately fetches the preferredNodeLocationData from the SparkContext and uses it to start requesting containers.
> But in the SparkContext constructor that takes the preferredNodeLocationData, setting preferredNodeLocationData comes after the rest of the initialization, so, if the Spark-YARN code comes around quickly enough after being notified, the data that's fetched is the empty unset version.  The occurred during all of my runs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)