You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Javier Domingo Cansino <ja...@gmail.com> on 2015/07/30 15:11:24 UTC
Python version collision
Hi,
I find rather confusing the documentation about the configuration options.
There are a lot of files that are not too clear on where to modify. For
example, spark-env vs spark-defaults.
I am getting an error with Python versions collision:
File "/root/spark/python/lib/pyspark.zip/pyspark/worker.py", line 64, in
main
("%d.%d" % sys.version_info[:2], version))
Exception: Python in worker has different version 2.7 than that in driver
3.4, PySpark cannot run with different minor versions
at
org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:138)
at
org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at
org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:315)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
But I have conf/spark-env.sh with:
PYSPARK_PYTHON=python3.4
Also, I am not sure about the shebang line there is in the top of the
spark-env because sourcing it would make the env vars be defined in a
subrpocess, so I removed that, but anyway, I am having the same problem,
Anyone has experience using python3? And with python3 in virtualenv?
Also, as a matter of feedback, I find rather difficult to deploy and
develop apps because although you may have ipython notebook, I haven't
found a way to include pyspark in my environment (with the rest of the
virtualenv libraries).
Thanks!