You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by fhoering <gi...@git.apache.org> on 2018/09/14 17:23:19 UTC

[GitHub] spark pull request #22422: [SPARK-25433][PYSPARK] Add support for pex in PyS...

GitHub user fhoering opened a pull request:

    https://github.com/apache/spark/pull/22422

    [SPARK-25433][PYSPARK] Add support for pex in PySpark

    ## What changes were proposed in this pull request?
    
    This change aims to provide the very basic support to provision the executors with pex files (instead of conda or virtual env). It contains only the minimal required changes. Everything else can be setup with environment variables.
    Similar to how it works today with conda the user needs to make sure that he has the same environment when submitting the Spark job and the environment provided in the pex file.
    
    
    ## How was this patch tested?
    
    Various runs with spark-submit (client, cluster) and by directly creating a SparkContext a Yarn Cluster.
    
    Also tested with a unit test using a locally created pex file (inspired by python/pyspark/tests/AddFileTests). The issue is that the pex files contains information about the platform and therefore I can't provide a generic test because of the check to have the same python environment on the client and on executors. It would mean to create a pex file for every python runtime and every existing platform. I can provide the unit test in here and a script to create the pex environment in order to execute locally. It also might be possible to use SparkSubmitTests as it calls a separate process (haven't tried yet)
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/fhoering/spark pex-support-fix2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22422.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22422
    
----
commit 3aeae95321f314b128d1ff86b4631145e558d43f
Author: Fabian Höring <f....@...>
Date:   2018-09-14T16:34:59Z

    Add support for pex in PySpark

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22422: [SPARK-25433][PYSPARK] Add support for pex in PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22422
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22422: [SPARK-25433][PYSPARK] Add support for pex in PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22422
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22422: [SPARK-25433][PYSPARK] Add support for pex in PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22422
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22422: [SPARK-25433][PYSPARK] Add support for pex in PySpark

Posted by fhoering <gi...@git.apache.org>.
Github user fhoering commented on the issue:

    https://github.com/apache/spark/pull/22422
  
    It turns out all this can be tailored by tuning the existing environment variables. It is enough to generate the pex file with a generic entry point that does the redirection to the custom module and then even the existing code work: `Arrays.asList(pythonExec, "-m", workerModule)`
    Will close this PR.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22422: [SPARK-25433][PYSPARK] Add support for pex in PyS...

Posted by fhoering <gi...@git.apache.org>.
Github user fhoering closed the pull request at:

    https://github.com/apache/spark/pull/22422


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org