You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Javier Domingo Cansino <ja...@fon.com> on 2015/08/11 11:02:45 UTC

Python3 Spark execution problems

Hi,

I have been trying to use spark for the processing I need to do in some
logs, and I have found several difficulties during the process. Most of
them I could overcome them, but I am really stuck in the last one.

I would really like to know how spark is supposed to be deployed. For now,
I have a ssh key in the master that can login in any worker.
start-master.sh and start-slaves.sh work.

According to the docs, I crafted the following command:
 ~/projects/bigdata/spark/spark/bin/spark-submit --py-files
/home/javier/projects/bigdata/bdml/dist/bdml-0.0.1.zip --master='spark://
10.0.0.71:7077' ml/spark_pipeline.py /srv/bdml/raw2json/json-logs.gz

First, when I tried to deploy my project, it was an impossible quest. I was
all the time getting module import errors:
Traceback (most recent call last):
  File "/home/javier/projects/bigdata/bdml/ml/spark_pipeline.py", line 10,
in <module>
    from .files import get_interesting_files

I tried everything, but there was a moment when I had to hop into scala
code to trace that error. Therefore I just merged all the functions of the
project in one file.

Then I started to get the following error:
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage
0.0 (TID 3, 10.0.0.73): org.apache.spark.api.python.PythonExce
ption: Traceback (most recent call last):
  File "/root/spark/python/lib/pyspark.zip/pyspark/worker.py", line 64, in
main
    ("%d.%d" % sys.version_info[:2], version))
Exception: Python in worker has different version 2.7 than that in driver
3.4, PySpark cannot run with different minor versions

I have specified #!/usr/bin/env python3 in the top of the file, and my
spark-env.sh on each worker contains the following lines.
SPARK_MASTER_IP=10.0.0.71
export PYSPARK_PYTHON=python3.4
PYSPARK_PYTHON=python3.4
export PYTHONHASHSEED=123
PYTHONHASHSEED=123

I had to specify the PYTHONHASHSEED because it wasn't propagating to the
workers.

I hope you can help me,

[image: Fon] <http://www.fon.com/>Javier Domingo CansinoResearch &
Development Engineer+34 946545847Skype: javier.domingo.fonAll information
in this email is confidential <http://corp.fon.com/legal/email-disclaimer>