You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Eric Kimbrel (JIRA)" <ji...@apache.org> on 2015/07/21 21:53:05 UTC

[jira] [Created] (SPARK-9229) pyspark yarn-cluster PYSPARK_PYTHON not set

Eric Kimbrel created SPARK-9229:
-----------------------------------

             Summary: pyspark yarn-cluster  PYSPARK_PYTHON not set
                 Key: SPARK-9229
                 URL: https://issues.apache.org/jira/browse/SPARK-9229
             Project: Spark
          Issue Type: Bug
    Affects Versions: 1.5.0
         Environment: centos 
            Reporter: Eric Kimbrel


PYSPARK_PYTHON is set in spark-env.sh to use an alternative python installation.

Use spark-submit to run a pyspark job in yarn with cluster deploy mode.

PYSPARK_PTYHON is not set in the cluster environment, and the system default python is used instead of the intended original.

test code: (simple.py)

from pyspark import SparkConf, SparkContext
import sys,os
conf = SparkConf()
sc = SparkContext(conf=conf)
out = [('PYTHON VERSION',str(sys.version))]
out.extend( zip( os.environ.keys(),os.environ.values() ) )
rdd = sc.parallelize(out)
rdd.coalesce(1).saveAsTextFile("hdfs://namenode/tmp/env")

submit command:

spark-submit --master yarn --deploy-mode cluster --num-executors 1 simple.py 

I've also tried setting PYSPARK_PYTHON on the command line with no effect.

It seems like there is no way to specify an alternative python executable in yarn-cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org