You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by zenglong chen <cz...@gmail.com> on 2019/07/16 11:15:24 UTC

spark python script importError problem

Hi,all,
      When i run a run a python script on spark submit,it done well in
local[*] mode,but not in standalone mode or yarn mode.The error like below:

Caused by: org.apache.spark.api.python.PythonException: Traceback (most
recent call last):
  File "/usr/local/lib/python2.7/dist-packages/pyspark/worker.py", line
364, in main
    func, profiler, deserializer, serializer = read_command(pickleSer,
infile)
  File "/usr/local/lib/python2.7/dist-packages/pyspark/worker.py", line 69,
in read_command
    command = serializer._read_with_length(file)
  File "/usr/local/lib/python2.7/dist-packages/pyspark/serializers.py",
line 172, in _read_with_length
    return self.loads(obj)
  File "/usr/local/lib/python2.7/dist-packages/pyspark/serializers.py",
line 583, in loads
    return pickle.loads(obj)
ImportError: No module named feature.user.user_feature

The script also run well in "sbin/start-master.sh sbin/start-slave.sh",but
it has the same importError problem in "sbin/start-master.sh
sbin/start-slaves.sh".The conf/slaves contents is 'localhost'.

What should i do to solve this import problem?Thanks!!!

Re: spark python script importError problem

Posted by Patrick McCarthy <pm...@dstillery.com.INVALID>.
Your module 'feature' isn't available to the yarn workers, so you'll need
to either install it on them if you have access, or else upload to the
workers at runtime using --py-files or similar.

On Tue, Jul 16, 2019 at 7:16 AM zenglong chen <cz...@gmail.com>
wrote:

> Hi,all,
>       When i run a run a python script on spark submit,it done well in
> local[*] mode,but not in standalone mode or yarn mode.The error like below:
>
> Caused by: org.apache.spark.api.python.PythonException: Traceback (most
> recent call last):
>   File "/usr/local/lib/python2.7/dist-packages/pyspark/worker.py", line
> 364, in main
>     func, profiler, deserializer, serializer = read_command(pickleSer,
> infile)
>   File "/usr/local/lib/python2.7/dist-packages/pyspark/worker.py", line
> 69, in read_command
>     command = serializer._read_with_length(file)
>   File "/usr/local/lib/python2.7/dist-packages/pyspark/serializers.py",
> line 172, in _read_with_length
>     return self.loads(obj)
>   File "/usr/local/lib/python2.7/dist-packages/pyspark/serializers.py",
> line 583, in loads
>     return pickle.loads(obj)
> ImportError: No module named feature.user.user_feature
>
> The script also run well in "sbin/start-master.sh sbin/start-slave.sh",but
> it has the same importError problem in "sbin/start-master.sh
> sbin/start-slaves.sh".The conf/slaves contents is 'localhost'.
>
> What should i do to solve this import problem?Thanks!!!
>


-- 


*Patrick McCarthy  *

Senior Data Scientist, Machine Learning Engineering

Dstillery

470 Park Ave South, 17th Floor, NYC 10016