You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Willi Raschkowski (Jira)" <ji...@apache.org> on 2022/07/02 02:49:00 UTC
[jira] [Created] (SPARK-39659) Add environment bin folder to R/Python subprocess PATH
Willi Raschkowski created SPARK-39659:
-----------------------------------------
Summary: Add environment bin folder to R/Python subprocess PATH
Key: SPARK-39659
URL: https://issues.apache.org/jira/browse/SPARK-39659
Project: Spark
Issue Type: Improvement
Components: PySpark
Affects Versions: 3.3.0
Reporter: Willi Raschkowski
Some Python packages rely on non-Python executables which are usually made available on the {{PATH}} through something like {{{}conda activate{}}}.
When using Spark with conda-pack environments added via {{{}spark.archives{}}}, Python packages aren't able to find conda-installed executables because Spark doesn't update {{{}PATH{}}}.
E.g.
{code:java|title=test.py}
# This only works if kaleido-python can find the conda-installed executable
fig = px.scatter(px.data.iris(), x="sepal_length", y="sepal_width", color="species")
fig.write_image("figure.png", engine="kaleido")
{code}
and
{code:java}
./bin/spark-submit --master yarn --deploy-mode cluster --archives environment.tar.gz#environment --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./environment/bin/python test.py
{code}
will throw
{code:java}
Traceback (most recent call last):
File "/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/kaleido-test.py", line 7, in <module>
fig.write_image("figure.png", engine="kaleido")
File "/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/plotly/basedatatypes.py", line 3829, in write_image
return pio.write_image(self, *args, **kwargs)
File "/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/plotly/io/_kaleido.py", line 267, in write_image
img_data = to_image(
File "/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/plotly/io/_kaleido.py", line 144, in to_image
img_bytes = scope.transform(
File "/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/scopes/plotly.py", line 153, in transform
response = self._perform_transform(
File "/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/scopes/base.py", line 293, in _perform_transform
self._ensure_kaleido()
File "/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/scopes/base.py", line 176, in _ensure_kaleido
proc_args = self._build_proc_args()
File "/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/scopes/base.py", line 123, in _build_proc_args
proc_args = [self.executable_path(), self.scope_name]
File "/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/scopes/base.py", line 99, in executable_path
raise ValueError(
ValueError:
The kaleido executable is required by the kaleido Python library, but it was not included
in the Python package and it could not be found on the system PATH.
Searched for included kaleido executable at:
/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/executable/kaleido
Searched for executable 'kaleido' on the following system PATH:
/usr/local/sbin
/usr/local/bin
/usr/sbin
/usr/bin
/sbin
/bin
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org