You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Harry Jamison <ha...@yahoo.com.INVALID> on 2023/09/05 05:08:31 UTC

pyspark.ml.recommendation is using the wrong python version

I am using python3.7 and spark 2.4.7
I am trying to figure out why my job is using the wrong python version
This is how it is starting up the logs confirm that I am using python 3.7But I later see the error message showing it is trying to us 3.8, and I am not sure where it is picking that up.

SPARK_HOME = /usr/local/lib/python3.7/dist-packages/pyspark
Here is my command
sudo --preserve-env -u spark pyspark --deploy-mode client --jars /opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/jars/phoenix5-spark-shaded-6.0.0.7.1.7.0-551.jar  --verbose --py-files pullhttp/base_http_pull.py --master yarn

Python 3.7.17 (default, Jun  6 2023, 20:10:10) 


[GCC 9.4.0] on linux


And when I try to run als.fit on my training data I get this
>>> model = als.fit(training)[Stage 0:>                                                          (0 + 1) / 1]23/09/04 21:42:10 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, datanode1, executor 2): org.apache.spark.SparkException: Error from python worker:  Traceback (most recent call last):    File "/usr/lib/python3.8/runpy.py", line 185, in _run_module_as_main      mod_name, mod_spec, code = _get_module_details(mod_name, _Error)    File "/usr/lib/python3.8/runpy.py", line 111, in _get_module_details      __import__(pkg_name)    File "<frozen importlib._bootstrap>", line 991, in _find_and_load    File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked    File "<frozen importlib._bootstrap>", line 655, in _load_unlocked    File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible    File "<frozen zipimport>", line 259, in load_module    File "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/__init__.py", line 51, in <module>

....

    File "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py", line 145, in <module>    File "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py", line 126, in _make_cell_set_template_code  TypeError: an integer is required (got type bytes)PYTHONPATH was:  /yarn/nm/usercache/spark/filecache/1130/__spark_libs__3536427065776590449.zip/spark-core_2.11-2.4.7.jar:/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/py4j-0.10.7-src.zip:/usr/local/lib/python3.7/dist-packages/pyspark/python/::/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/__pyfiles__:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/py4j-0.10.7-src.ziporg.apache.spark.SparkException: No port number in pyspark.daemon's stdout


Re: pyspark.ml.recommendation is using the wrong python version

Posted by Mich Talebzadeh <mi...@gmail.com>.
Hi,

Have you set python environment variables correctly?

PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON?

You can print the environment variables within your PySpark script to
verify this:

import os
print("PYTHONPATH:", os.environ.get("PYTHONPATH"))
print("PYSPARK_PYTHON:", os.environ.get("PYSPARK_PYTHON"))
print("PYSPARK_DRIIVER_PYTHON:", os.environ.get("PYSPARK_DRIVER_PYTHON"))

You can set this in your .bashrc or $SPARK_HOME/confspark-env.sh

HTH

Mich Talebzadeh,
Solutions Architect & Engineer
London
United Kingdom

Disclaimer: Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



Mich Talebzadeh,
Distinguished Technologist, Solutions Architect & Engineer
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 5 Sept 2023 at 06:12, Harry Jamison
<ha...@yahoo.com.invalid> wrote:

> That did not paste well, let me try again
>
>
> I am using python3.7 and spark 2.4.7
>
> I am trying to figure out why my job is using the wrong python version
>
> This is how it is starting up the logs confirm that I am using python 3.7
> But I later see the error message showing it is trying to us 3.8, and I am
> not sure where it is picking that up.
>
>
> SPARK_HOME = /usr/local/lib/python3.7/dist-packages/pyspark
>
> Here is my command
> sudo --preserve-env -u spark pyspark --deploy-mode client --jars
> /opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/jars/phoenix5-spark-shaded-6.0.0.7.1.7.0-551.jar
> --verbose --py-files pullhttp/base_http_pull.py --master yarn
>
> Python 3.7.17 (default, Jun  6 2023, 20:10:10)
>
> [GCC 9.4.0] on linux
>
>
>
> And when I try to run als.fit on my training data I get this
>
> >>> model = als.fit(training)
> [Stage 0:>                                                          (0 +
> 1) / 1]23/09/04 21:42:10 WARN scheduler.TaskSetManager: Lost task 0.0 in
> stage 0.0 (TID 0, datanode1, executor 2): org.apache.spark.SparkException:
> Error from python worker:
>   Traceback (most recent call last):
>     File "/usr/lib/python3.8/runpy.py", line 185, in _run_module_as_main
>       mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
>     File "/usr/lib/python3.8/runpy.py", line 111, in _get_module_details
>       __import__(pkg_name)
>     File "<frozen importlib._bootstrap>", line 991, in _find_and_load
>     File "<frozen importlib._bootstrap>", line 975, in
> _find_and_load_unlocked
>     File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
>     File "<frozen importlib._bootstrap>", line 618, in
> _load_backward_compatible
>     File "<frozen zipimport>", line 259, in load_module
>     File
> "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/__init__.py",
> line 51, in <module>
>
>
> ....
>
>
>     File
> "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py",
> line 145, in <module>
>     File
> "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py",
> line 126, in _make_cell_set_template_code
>   TypeError: an integer is required (got type bytes)
> PYTHONPATH was:
>
> /yarn/nm/usercache/spark/filecache/1130/__spark_libs__3536427065776590449.zip/spark-core_2.11-2.4.7.jar:/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/py4j-0.10.7-src.zip:/usr/local/lib/python3.7/dist-packages/pyspark/python/::/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/__pyfiles__:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/py4j-0.10.7-src.zip
> org.apache.spark.SparkException: No port number in pyspark.daemon's stdout
>
>
> On Monday, September 4, 2023 at 10:08:56 PM PDT, Harry Jamison
> <ha...@yahoo.com.invalid> wrote:
>
>
> I am using python3.7 and spark 2.4.7
>
> I am trying to figure out why my job is using the wrong python version
>
> This is how it is starting up the logs confirm that I am using python 3.7
> But I later see the error message showing it is trying to us 3.8, and I am
> not sure where it is picking that up.
>
>
> SPARK_HOME = /usr/local/lib/python3.7/dist-packages/pyspark
>
> Here is my command
>
> sudo --preserve-env -u spark pyspark --deploy-mode client --jars
> /opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/jars/phoenix5-spark-shaded-6.0.0.7.1.7.0-551.jar
> --verbose --py-files pullhttp/base_http_pull.py --master yarn
>
> Python 3.7.17 (default, Jun  6 2023, 20:10:10)
>
> [GCC 9.4.0] on linux
>
>
> And when I try to run als.fit on my training data I get this
>
> >>> model = als.fit(training)
> [Stage 0:>                                                          (0 +
> 1) / 1]23/09/04 21:42:10 WARN scheduler.TaskSetManager: Lost task 0.0 in
> stage 0.0 (TID 0, datanode1, executor 2): org.apache.spark.SparkException:
> Error from python worker:
>   Traceback (most recent call last):
>     File "/usr/lib/python3.8/runpy.py", line 185, in _run_module_as_main
>       mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
>     File "/usr/lib/python3.8/runpy.py", line 111, in _get_module_details
>       __import__(pkg_name)
>     File "<frozen importlib._bootstrap>", line 991, in _find_and_load
>     File "<frozen importlib._bootstrap>", line 975, in
> _find_and_load_unlocked
>     File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
>     File "<frozen importlib._bootstrap>", line 618, in
> _load_backward_compatible
>     File "<frozen zipimport>", line 259, in load_module
>     File
> "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/__init__.py",
> line 51, in <module>
>
>
> ....
>
>
>     File
> "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py",
> line 145, in <module>
>     File
> "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py",
> line 126, in _make_cell_set_template_code
>   TypeError: an integer is required (got type bytes)
> PYTHONPATH was:
>
> /yarn/nm/usercache/spark/filecache/1130/__spark_libs__3536427065776590449.zip/spark-core_2.11-2.4.7.jar:/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/py4j-0.10.7-src.zip:/usr/local/lib/python3.7/dist-packages/pyspark/python/::/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/__pyfiles__:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/py4j-0.10.7-src.zip
> org.apache.spark.SparkException: No port number in pyspark.daemon's stdout
>
>
>

Re: pyspark.ml.recommendation is using the wrong python version

Posted by Harry Jamison <ha...@yahoo.com.INVALID>.
 That did not paste well, let me try again

I am using python3.7 and spark 2.4.7
I am trying to figure out why my job is using the wrong python version
This is how it is starting up the logs confirm that I am using python 3.7But I later see the error message showing it is trying to us 3.8, and I am not sure where it is picking that up.

SPARK_HOME = /usr/local/lib/python3.7/dist-packages/pyspark
Here is my commandsudo --preserve-env -u spark pyspark --deploy-mode client --jars /opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/jars/phoenix5-spark-shaded-6.0.0.7.1.7.0-551.jar  --verbose --py-files pullhttp/base_http_pull.py --master yarn
Python 3.7.17 (default, Jun  6 2023, 20:10:10) 
[GCC 9.4.0] on linux


And when I try to run als.fit on my training data I get this
>>> model = als.fit(training)[Stage 0:>                                                          (0 + 1) / 1]23/09/04 21:42:10 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, datanode1, executor 2): org.apache.spark.SparkException: Error from python worker:  Traceback (most recent call last):    File "/usr/lib/python3.8/runpy.py", line 185, in _run_module_as_main      mod_name, mod_spec, code = _get_module_details(mod_name, _Error)    File "/usr/lib/python3.8/runpy.py", line 111, in _get_module_details      __import__(pkg_name)    File "<frozen importlib._bootstrap>", line 991, in _find_and_load    File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked    File "<frozen importlib._bootstrap>", line 655, in _load_unlocked    File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible    File "<frozen zipimport>", line 259, in load_module    File "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/__init__.py", line 51, in <module>

....

    File "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py", line 145, in <module>    File "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py", line 126, in _make_cell_set_template_code  TypeError: an integer is required (got type bytes)PYTHONPATH was:  /yarn/nm/usercache/spark/filecache/1130/__spark_libs__3536427065776590449.zip/spark-core_2.11-2.4.7.jar:/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/py4j-0.10.7-src.zip:/usr/local/lib/python3.7/dist-packages/pyspark/python/::/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/__pyfiles__:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/py4j-0.10.7-src.ziporg.apache.spark.SparkException: No port number in pyspark.daemon's stdout

    On Monday, September 4, 2023 at 10:08:56 PM PDT, Harry Jamison <ha...@yahoo.com.invalid> wrote:  
 
 I am using python3.7 and spark 2.4.7
I am trying to figure out why my job is using the wrong python version
This is how it is starting up the logs confirm that I am using python 3.7But I later see the error message showing it is trying to us 3.8, and I am not sure where it is picking that up.

SPARK_HOME = /usr/local/lib/python3.7/dist-packages/pyspark
Here is my command
sudo --preserve-env -u spark pyspark --deploy-mode client --jars /opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/jars/phoenix5-spark-shaded-6.0.0.7.1.7.0-551.jar  --verbose --py-files pullhttp/base_http_pull.py --master yarn

Python 3.7.17 (default, Jun  6 2023, 20:10:10) 


[GCC 9.4.0] on linux


And when I try to run als.fit on my training data I get this
>>> model = als.fit(training)[Stage 0:>                                                          (0 + 1) / 1]23/09/04 21:42:10 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, datanode1, executor 2): org.apache.spark.SparkException: Error from python worker:  Traceback (most recent call last):    File "/usr/lib/python3.8/runpy.py", line 185, in _run_module_as_main      mod_name, mod_spec, code = _get_module_details(mod_name, _Error)    File "/usr/lib/python3.8/runpy.py", line 111, in _get_module_details      __import__(pkg_name)    File "<frozen importlib._bootstrap>", line 991, in _find_and_load    File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked    File "<frozen importlib._bootstrap>", line 655, in _load_unlocked    File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible    File "<frozen zipimport>", line 259, in load_module    File "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/__init__.py", line 51, in <module>

....

    File "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py", line 145, in <module>    File "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py", line 126, in _make_cell_set_template_code  TypeError: an integer is required (got type bytes)PYTHONPATH was:  /yarn/nm/usercache/spark/filecache/1130/__spark_libs__3536427065776590449.zip/spark-core_2.11-2.4.7.jar:/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/py4j-0.10.7-src.zip:/usr/local/lib/python3.7/dist-packages/pyspark/python/::/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/__pyfiles__:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/py4j-0.10.7-src.ziporg.apache.spark.SparkException: No port number in pyspark.daemon's stdout