You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Patrick Wendell (JIRA)" <ji...@apache.org> on 2014/06/11 02:06:02 UTC
[jira] [Updated] (SPARK-2089) With YARN, preferredNodeLocalityData
isn't honored
[ https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Patrick Wendell updated SPARK-2089:
-----------------------------------
Target Version/s: 1.0.0, 1.0.1
> With YARN, preferredNodeLocalityData isn't honored
> ---------------------------------------------------
>
> Key: SPARK-2089
> URL: https://issues.apache.org/jira/browse/SPARK-2089
> Project: Spark
> Issue Type: Bug
> Components: YARN
> Affects Versions: 1.0.0
> Reporter: Sandy Ryza
>
> When running in YARN cluster mode, apps can pass preferred locality data when constructing a Spark context that will dictate where to request executor containers.
> This is currently broken because of a race condition. The Spark-YARN code runs the user class and waits for it to start up a SparkContext. During its initialization, the SparkContext will create a YarnClusterScheduler, which notifies a monitor in the Spark-YARN code that . The Spark-Yarn code then immediately fetches the preferredNodeLocationData from the SparkContext and uses it to start requesting containers.
> But in the SparkContext constructor that takes the preferredNodeLocationData, setting preferredNodeLocationData comes after the rest of the initialization, so, if the Spark-YARN code comes around quickly enough after being notified, the data that's fetched is the empty unset version. The occurred during all of my runs.
--
This message was sent by Atlassian JIRA
(v6.2#6252)