You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Kaxil Naik (Jira)" <ji...@apache.org> on 2019/12/18 03:16:00 UTC
[jira] [Resolved] (AIRFLOW-5744) Environment variables not
correctly set in Spark submit operator
[ https://issues.apache.org/jira/browse/AIRFLOW-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kaxil Naik resolved AIRFLOW-5744.
---------------------------------
Resolution: Fixed
> Environment variables not correctly set in Spark submit operator
> ----------------------------------------------------------------
>
> Key: AIRFLOW-5744
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5744
> Project: Apache Airflow
> Issue Type: Bug
> Components: contrib, operators
> Affects Versions: 1.10.5
> Reporter: Joseph McCartin
> Assignee: Joseph McCartin
> Priority: Trivial
> Fix For: 1.10.7
>
>
> AIRFLOW-2380 added support for setting environment variables at runtime for the SparkSubmitOperator. The intention was to allow for dynamic configuration paths (such as HADOOP_CONF_DIR). The pull request, however, only made it so that these env vars would only be set at runtime if a standalone cluster and a client deploy mode was chosen. For kubernetes and yarn modes, the env vars would be sent to the driver via the spark arguments _spark.yarn.appMasterEnv_ (and equivalent for k8s).
> If one wishes to dynamically set the yarn master address (via a _yarn-site.xml_ file), then one or more environment variables __ need to be present at runtime, and this is not currently done.
> The SparkSubmitHook class var `_env` is assigned the `_env_vars` variable from the SparkSubmitOperator, in the `_build_spark_submit_command` method. If running in YARN mode however, this is not set as it should be, and therefore `_env` is not passed to the Popen process.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)