You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Eric Kimbrel (JIRA)" <ji...@apache.org> on 2015/07/21 21:53:05 UTC
[jira] [Created] (SPARK-9229) pyspark yarn-cluster PYSPARK_PYTHON
not set
Eric Kimbrel created SPARK-9229:
-----------------------------------
Summary: pyspark yarn-cluster PYSPARK_PYTHON not set
Key: SPARK-9229
URL: https://issues.apache.org/jira/browse/SPARK-9229
Project: Spark
Issue Type: Bug
Affects Versions: 1.5.0
Environment: centos
Reporter: Eric Kimbrel
PYSPARK_PYTHON is set in spark-env.sh to use an alternative python installation.
Use spark-submit to run a pyspark job in yarn with cluster deploy mode.
PYSPARK_PTYHON is not set in the cluster environment, and the system default python is used instead of the intended original.
test code: (simple.py)
from pyspark import SparkConf, SparkContext
import sys,os
conf = SparkConf()
sc = SparkContext(conf=conf)
out = [('PYTHON VERSION',str(sys.version))]
out.extend( zip( os.environ.keys(),os.environ.values() ) )
rdd = sc.parallelize(out)
rdd.coalesce(1).saveAsTextFile("hdfs://namenode/tmp/env")
submit command:
spark-submit --master yarn --deploy-mode cluster --num-executors 1 simple.py
I've also tried setting PYSPARK_PYTHON on the command line with no effect.
It seems like there is no way to specify an alternative python executable in yarn-cluster mode.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org