You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Andrew Or (JIRA)" <ji...@apache.org> on 2015/09/03 22:54:45 UTC

[jira] [Resolved] (SPARK-9672) Drivers run in cluster mode on mesos may not have spark-env variables available

     [ https://issues.apache.org/jira/browse/SPARK-9672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Or resolved SPARK-9672.
------------------------------
          Resolution: Fixed
       Fix Version/s: 1.6.0
    Target Version/s: 1.6.0

> Drivers run in cluster mode on mesos may not have spark-env variables available
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-9672
>                 URL: https://issues.apache.org/jira/browse/SPARK-9672
>             Project: Spark
>          Issue Type: Bug
>          Components: Mesos, Spark Submit
>    Affects Versions: 1.4.1
>         Environment: Ubuntu 14.04
> Mesos 0.23 (compiled from source following instructions on mesos site)
> Spark 1.4 prebuilt for hadoop 2.6
> Test setup was a two node mesos cluster. One dedicated master and one dedicated slave. Spark submissions occurred on the master and were directed at a mesos dispatcher running on the master.
>            Reporter: Patrick Shields
>            Priority: Minor
>             Fix For: 1.6.0
>
>
> This issue definitely affects Mesos mode, but may effect complex standalone topologies as well.
> When running spark-submit with {noformat}--deploy-mode cluster{noformat} environment variables set in {{spark-env.sh}} that are not prefixed with {{SPARK_}} are not available in the driver process. The behavior I expect is that any variables set in {{spark-env.sh}} are available on the driver and all executors.
> {{spark-env.sh}} is executed by {{load-spark-env.sh}} which uses an environment variable {{SPARK_ENV_LOADED}} [[code|https://github.com/apache/spark/blob/master/bin/load-spark-env.sh#L25]] to ensure that it is only run once. When using the {{RestSubmissionClient}}, spark submit propagates all environment variables that are prefixed with {{SPARK_}} [[code|https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionClient.scala#L400]] to the {{MesosRestServer}} where they are used to initialize the driver [[code|https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L155]]. During this process, {{SPARK_ENV_LOADED}} is propagated to the new driver process (since running spark submit has caused {{load-spark-env.sh}} to be run on the submitter's machine) [[code|https://github.com/apache/spark/blob/d86bbb4e286f16f77ba125452b07827684eafeed/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L371]]. Now when {{load-spark-env.sh}} is called by {{MesosClusterScheduler}} {{SPARK_ENV_LOADED}} is set and {{spark-env.sh}} is never sourced.
> [This gist|https://gist.github.com/pashields/9fe662d6ec5c079bdf70] shows the testing setup I used while investigating this issue. An example invocation looked like {noformat}spark-1.5.0-SNAPSHOT-bin-custom-spark/bin/spark-submit --deploy-mode cluster --master mesos://172.31.34.154:7077 --class Test spark-env-var-test_2.10-0.1-SNAPSHOT.jar{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org