You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Aaron Glahe (JIRA)" <ji...@apache.org> on 2015/07/22 02:12:04 UTC

[jira] [Created] (SPARK-9235) PYSPARK_DRIVER_PYTHON env variable is not set on the YARN Node manager acting as driver when yarn-cluster mode

Aaron Glahe created SPARK-9235:
----------------------------------

             Summary: PYSPARK_DRIVER_PYTHON env variable is not set on the YARN Node manager acting as driver when yarn-cluster mode
                 Key: SPARK-9235
                 URL: https://issues.apache.org/jira/browse/SPARK-9235
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 1.4.1, 1.5.0
         Environment: CentOS 6.6, python 2.7, Spark 1.4.1 tagged version, YARN Cluster Manager, CDH 5.4.1 (Hadoop 2.6.0++), Java 1.7
            Reporter: Aaron Glahe
            Priority: Minor


Relates to SPARK-9229

Env:  Spark on YARN, Java 1.7, Centos 6.6, CDH 5.4.1 (Hadoop 2.6.0++), Anaconda Python 2.7.10 "installed" in /srv/software directory

On a client/submitting machine, we set the PYSPARK_DRIVER_PYTHON env var in spark-env.sh that pointed the anaconda python executable, which was on every YARN node: 

export PYSPARK_DRIVER_PYTHON='/srv/software/anaconda/bin/python'

side note, export PYSPARK_PYTHON='/srv/software/anaconda/bin/python' was set as well in the spark-env.sh.

run the command:
spark-submit test.py --master yarn --deploy-mode cluster

It appears as though the Node Manager with the DRIVER does not use the PYSPARK_DRIVER_PYTHON env python, but instead uses the CentOS system default (which in this case is python 2.6).

Workaround appears to setting the python path in the SPARK_YARN_USER_ENV



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org