You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Qi Shao (JIRA)" <ji...@apache.org> on 2018/11/30 18:42:00 UTC

[jira] [Created] (SPARK-26237) [K8s] Unable to switch python version in executor when running pyspark shell.

Qi Shao created SPARK-26237:
-------------------------------

             Summary: [K8s] Unable to switch python version in executor when running pyspark shell.
                 Key: SPARK-26237
                 URL: https://issues.apache.org/jira/browse/SPARK-26237
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.4.0
         Environment: Spark 2.4.0

Google Kubernetes Engines
            Reporter: Qi Shao


Error message:
{code:java}
Exception: Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.{code}
Neither
{code:java}
spark.kubernetes.pyspark.pythonVersion{code}
nor 
{code:java}
spark.executorEnv.PYSPARK_MAJOR_PYTHON_VERSION {code}
works.

This happens when I'm running a Notebook with pyspark+python3 and also in a pod which has pyspark+python3.

For notebook, the code is:
{code:java}
```
from _future_ import print_function
import sys
from random import random
from operator import add
from pyspark.sql import SparkSession
spark = SparkSession.builder\
 .master("k8s://https://kubernetes.default.svc")\
 .appName("PySpark Testout")\
 .config("spark.submit.deployMode","client")\
 .config("spark.executor.instances", "2")\
 .config("spark.kubernetes.container.image","azureq/pantheon:pyspark-2.4")\
 .config("spark.driver.host","jupyter-notebook-headless")\
 .config("spark.driver.pod.name","jupyter-notebook-headless")\
 .config("spark.kubernetes.authenticate.driver.serviceAccountName","spark")\
 .config("spark.kubernetes.pyspark.pythonVersion","3")\
 .config("spark.executorEnv.PYSPARK_MAJOR_PYTHON_VERSION","3")\
 .getOrCreate()
n = 100000
def f(_):
    x = random() * 2 - 1
    y = random() * 2 - 1
    return 1 if x ** 2 + y ** 2 <= 1 else 0
count = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
print("Pi is roughly %f" % (4.0 * count / n))
{code}
 For pyspark shell, the command is:

 
{code:java}
$SPARK_HOME/bin/pyspark --master \ k8s://https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_SERVICE_PORT_HTTPS \
 --deploy-mode client \
 --conf spark.executor.instances=5 \
 --conf spark.kubernetes.container.image=azureq/pantheon:pyspark-2.4 \
 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
 --conf spark.driver.host=spark-client-mode-headless \
 --conf spark.kubernetes.pyspark.pythonVersion=3 \
 --conf spark.driver.pod.name=spark-client-mode-headless{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org