You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Willi Raschkowski (Jira)" <ji...@apache.org> on 2022/07/02 02:49:00 UTC

[jira] [Created] (SPARK-39659) Add environment bin folder to R/Python subprocess PATH

Willi Raschkowski created SPARK-39659:
-----------------------------------------

             Summary: Add environment bin folder to R/Python subprocess PATH
                 Key: SPARK-39659
                 URL: https://issues.apache.org/jira/browse/SPARK-39659
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
    Affects Versions: 3.3.0
            Reporter: Willi Raschkowski


Some Python packages rely on non-Python executables which are usually made available on the {{PATH}} through something like {{{}conda activate{}}}.

When using Spark with conda-pack environments added via {{{}spark.archives{}}}, Python packages aren't able to find conda-installed executables because Spark doesn't update {{{}PATH{}}}.

E.g.
{code:java|title=test.py}
# This only works if kaleido-python can find the conda-installed executable
fig = px.scatter(px.data.iris(), x="sepal_length", y="sepal_width", color="species")
fig.write_image("figure.png", engine="kaleido")
{code}
and
{code:java}
./bin/spark-submit --master yarn --deploy-mode cluster --archives environment.tar.gz#environment --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./environment/bin/python test.py
{code}
will throw
{code:java}
Traceback (most recent call last):
  File "/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/kaleido-test.py", line 7, in <module>
    fig.write_image("figure.png", engine="kaleido")
  File "/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/plotly/basedatatypes.py", line 3829, in write_image
    return pio.write_image(self, *args, **kwargs)
  File "/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/plotly/io/_kaleido.py", line 267, in write_image
    img_data = to_image(
  File "/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/plotly/io/_kaleido.py", line 144, in to_image
    img_bytes = scope.transform(
  File "/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/scopes/plotly.py", line 153, in transform
    response = self._perform_transform(
  File "/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/scopes/base.py", line 293, in _perform_transform
    self._ensure_kaleido()
  File "/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/scopes/base.py", line 176, in _ensure_kaleido
    proc_args = self._build_proc_args()
  File "/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/scopes/base.py", line 123, in _build_proc_args
    proc_args = [self.executable_path(), self.scope_name]
  File "/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/scopes/base.py", line 99, in executable_path
    raise ValueError(
ValueError: 
The kaleido executable is required by the kaleido Python library, but it was not included
in the Python package and it could not be found on the system PATH.

Searched for included kaleido executable at:
    /tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/executable/kaleido 

Searched for executable 'kaleido' on the following system PATH:
    /usr/local/sbin
    /usr/local/bin
    /usr/sbin
    /usr/bin
    /sbin
    /bin
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org