You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2015/04/03 00:27:58 UTC
[jira] [Assigned] (SPARK-2669) Hadoop configuration is not localised when submitting job in yarn-cluster mode

     [ https://issues.apache.org/jira/browse/SPARK-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-2669:
-----------------------------------

    Assignee: Apache Spark

> Hadoop configuration is not localised when submitting job in yarn-cluster mode
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-2669
>                 URL: https://issues.apache.org/jira/browse/SPARK-2669
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.0.0
>            Reporter: Maxim Ivanov
>            Assignee: Apache Spark
>
> I'd like to propose a fix for a problem when Hadoop configuration is not localized when job is submitted in yarn-cluster mode. Here is a description from github pull request https://github.com/apache/spark/pull/1574
> This patch fixes a problem when Spark driver is run in the container
> managed by YARN ResourceManager it inherits configuration from a
> NodeManager process, which can be different from the Hadoop
> configuration present on the client (submitting machine). Problem is
> most vivid when fs.defaultFS property differs between these two.
> Hadoop MR solves it by serializing client's Hadoop configuration into
> job.xml in application staging directory and then making Application
> Master to use it. That guarantees that regardless of execution nodes
> configurations all application containers use same config identical to
> one on the client side.
> This patch uses similar approach. YARN ClientBase serializes
> configuration and adds it to ClientDistributedCacheManager under
> "job.xml" link name. ClientDistributedCacheManager is then utilizes
> Hadoop localizer to deliver it to whatever container is started by this
> application, including the one running Spark driver.
> YARN ClientBase also adds "SPARK_LOCAL_HADOOPCONF" env variable to AM
> container request which is then used by SparkHadoopUtil.newConfiguration
> to trigger new behavior when machine-wide hadoop configuration is merged
> with application specific job.xml (exactly how it is done in Hadoop MR).
> SparkContext is then follows same approach, adding
> SPARK_LOCAL_HADOOPCONF env to all spawned containers to make them use
> client-side Hadopo configuration.
> Also all the references to "new Configuration()" which might be executed
> on YARN cluster side are changed to use SparkHadoopUtil.get.conf
> Please note that it fixes only core Spark, the part which I am
> comfortable to test and verify the result. I didn't descend into
> steaming/shark directories, so things might need to be changed there too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org