You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Fabian Höring (JIRA)" <ji...@apache.org> on 2018/09/14 17:17:00 UTC

[jira] [Created] (SPARK-25433) Add support for PEX in PySpark

Fabian Höring created SPARK-25433:
-------------------------------------

             Summary: Add support for PEX in PySpark
                 Key: SPARK-25433
                 URL: https://issues.apache.org/jira/browse/SPARK-25433
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
    Affects Versions: 2.2.2
            Reporter: Fabian Höring


This has been partly discussed SPARK-13587

I would like to provision the executors with a PEX package. I created a PR with minimal necessary changes necessary in PythonWorkerFactory.

To run it one needs to set PYSPARK_PYTHON & PYSPARK_DRIVER_PYTHON variables to the pex file and upload the pex file to the executors via sparkContext.addFile or by setting the spark config spark.yarn.dist.files/spark.file properties

Also it is necessary to set the PEX_ROOT environment variable. By default inside the executors it tries to access /home/.pex and this fails.

Ideally, as those configuration is quite cumbersome, it would be interesting to also add a parameter --pexFile to SparkContext and spark-submit in order to directly provide a pexFile. Please tell me what you think of this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org