You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by 李奇平 <qi...@alibaba-inc.com> on 2014/06/10 15:35:17 UTC

Can't find pyspark when using PySpark on YARN

Dear all,

When I submit a pyspark application using this command:
./bin/spark-submit --master yarn-client examples/src/main/python/wordcount.py "hdfs://..."
I get the following exception:
Error from python worker:
Traceback (most recent call last):
File "/usr/ali/lib/python2.5/runpy.py", line 85, in run_module
loader = get_loader(mod_name)
File "/usr/ali/lib/python2.5/pkgutil.py", line 456, in get_loader
return find_loader(fullname)
File "/usr/ali/lib/python2.5/pkgutil.py", line 466, in find_loader
for importer in iter_importers(fullname):
File "/usr/ali/lib/python2.5/pkgutil.py", line 422, in iter_importers
__import__(pkg)
ImportError: No module named pyspark
PYTHONPATH was:
/home/xxx/spark/python:/home/xxx/spark_on_yarn/python/lib/py4j-0.8.1-src.zip:/disk11/mapred/tmp/usercache/xxxx/filecache/11/spark-assembly-1.0.0-hadoop2.0.0-ydh2.0.0.jar
Maybe `pyspark/python` and `py4j-0.8.1-src.zip` is not included in the YARN worker, How can I distribute these files with my application? Can I use `--pyfiles python.zip, py4j-0.8.1-src.zip `?Or how can I package modules in pyspark to a .egg file?


Re: Can't find pyspark when using PySpark on YARN

Posted by Andrew Or <an...@databricks.com>.
Hi Qi Ping,

You don't have to distribute these files; they are automatically packaged
in the assembly jar, which is already shipped to the worker nodes.

Other people have run into the same issue. See if the instructions here are
of any help:
http://mail-archives.apache.org/mod_mbox/spark-user/201406.mbox/%3cCAMJOb8mr1+ias-SLDz_RfRKe_nA2UUbNmHraC4NUKqYqNUNHuQ@mail.gmail.com%3e

As described in the link, the last resort is to try building your assembly
jar with JAVA_HOME set to Java 6. This usually fixes the problem (more
details in the link provided).

Cheers,
Andrew


2014-06-10 6:35 GMT-07:00 李奇平 <qi...@alibaba-inc.com>:

> Dear all,
>
> When I submit a pyspark application using this command:
>
> ./bin/spark-submit --master yarn-client
> examples/src/main/python/wordcount.py "hdfs://..."
>
> I get the following exception:
>
> Error from python worker:
> Traceback (most recent call last):
> File "/usr/ali/lib/python2.5/runpy.py", line 85, in run_module
> loader = get_loader(mod_name)
> File "/usr/ali/lib/python2.5/pkgutil.py", line 456, in get_loader
> return find_loader(fullname)
> File "/usr/ali/lib/python2.5/pkgutil.py", line 466, in find_loader
> for importer in iter_importers(fullname):
> File "/usr/ali/lib/python2.5/pkgutil.py", line 422, in iter_importers
> __import__(pkg)
> ImportError: No module named pyspark
> PYTHONPATH was:
>
> /home/xxx/spark/python:/home/xxx/spark_on_yarn/python/lib/py4j-0.8.1-src.zip:/disk11/mapred/tmp/usercache/xxxx/filecache/11/spark-assembly-1.0.0-hadoop2.0.0-ydh2.0.0.jar
>
> Maybe `pyspark/python` and `py4j-0.8.1-src.zip` is not included in the
> YARN worker,
> How can I distribute these files with my application? Can I use `--pyfiles
> python.zip, py4j-0.8.1-src.zip `?
> Or how can I package modules in pyspark to a .egg file?
>
>
>
>