You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2021/01/21 13:16:52 UTC

[spark] branch branch-3.1 updated: [SPARK-34190][DOCS] Supplement the description for Python Package Management

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
     new 5eaa872  [SPARK-34190][DOCS] Supplement the description for Python Package Management
5eaa872 is described below

commit 5eaa872068cbe580d573d7ff25a57b15811a3ddc
Author: itholic <ha...@naver.com>
AuthorDate: Thu Jan 21 22:15:42 2021 +0900

    [SPARK-34190][DOCS] Supplement the description for Python Package Management
    
    ### What changes were proposed in this pull request?
    
    This PR supplements the contents in the "Python Package Management".
    
    If there is no Python installed in the local for all nodes when using `venv-pack`, job would fail as below.
    
    ```python
    >>> from pyspark.sql.functions import pandas_udf
    >>> pandas_udf('double')
    ... def pandas_plus_one(v: pd.Series) -> pd.Series:
    ...     return v + 1
    ...
    >>> spark.range(10).select(pandas_plus_one("id")).show()
    ...
    Cannot run program "./environment/bin/python": error=2, No such file or directory
    ...
    ```
    
    This is because the Python in the [packed environment via `venv-pack` has a symbolic link](https://github.com/jcrist/venv-pack/issues/5) that connects Python to the local one.
    
    To avoid this confusion, it seems better to have an additional explanation for this.
    
    ### Why are the changes needed?
    
    To provide more detailed information to users so that they don’t get confused
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, this PR fixes the part of "Python Package Management"  in the "User Guide" documents.
    
    ### How was this patch tested?
    
    Manually built the doc.
    
    ![Screen Shot 2021-01-21 at 7 10 38 PM](https://user-images.githubusercontent.com/44108233/105336258-5e8bec00-5c1c-11eb-870c-86acfc77c082.png)
    
    Closes #31280 from itholic/SPARK-34190.
    
    Authored-by: itholic <ha...@naver.com>
    Signed-off-by: HyukjinKwon <gu...@apache.org>
    (cherry picked from commit 28131a7794568944173e66de930c86d498ab55b5)
    Signed-off-by: HyukjinKwon <gu...@apache.org>
---
 python/docs/source/user_guide/python_packaging.rst | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/python/docs/source/user_guide/python_packaging.rst b/python/docs/source/user_guide/python_packaging.rst
index f57acdd..0fff6fe 100644
--- a/python/docs/source/user_guide/python_packaging.rst
+++ b/python/docs/source/user_guide/python_packaging.rst
@@ -140,8 +140,9 @@ Python dependencies in their clusters by using `venv-pack <https://jcristharif.c
 in a similar way as conda-pack.
 
 A virtual environment to use on both driver and executor can be created as demonstrated below.
-It packs the current virtual environment to an archive file, and It self-contains both Python interpreter
-and the dependencies.
+It packs the current virtual environment to an archive file, and it contains both Python interpreter and the dependencies.
+However, it requires all nodes in a cluster to have the same Python interpreter installed because
+`venv-pack packs Python interpreter as a symbolic link <https://github.com/jcrist/venv-pack/issues/5>`_.
 
 
 .. code-block:: bash


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org