You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Javier Domingo Cansino <ja...@gmail.com> on 2015/07/30 15:11:24 UTC

Python version collision

Hi,

I find rather confusing the documentation about the configuration options.
There are a lot of files that are not too clear on where to modify. For
example, spark-env vs spark-defaults.

I am getting an error with Python versions collision:

  File "/root/spark/python/lib/pyspark.zip/pyspark/worker.py", line 64, in
main
    ("%d.%d" % sys.version_info[:2], version))
Exception: Python in worker has different version 2.7 than that in driver
3.4, PySpark cannot run with different minor versions

        at
org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:138)
        at
org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179)
        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at
org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:315)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
        at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:70)
        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)


But I have conf/spark-env.sh with:

PYSPARK_PYTHON=python3.4

Also, I am not sure about the shebang line there is in the top of the
spark-env because sourcing it would make the env vars be defined in a
subrpocess, so I removed that, but anyway, I am having the same problem,

Anyone has experience using python3? And with python3 in virtualenv?

Also, as a matter of feedback, I find rather difficult to deploy and
develop apps because although you may have ipython notebook, I haven't
found a way to include pyspark in my environment (with the rest of the
virtualenv libraries).

Thanks!