You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2018/05/08 08:11:35 UTC

[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/21267

    [SPARK-21945][YARN][PYTHON] Make --py-files work in PySpark shell in Yarn client mode

    ## What changes were proposed in this pull request?
    
    ### Problem
    
    When we run _PySpark shell with Yarn client mode_, specified `--py-files` are not recognised in _driver side_.
    
    Here are the steps I took to check:
    
    ```bash
    $ cat /home/spark/tmp.py
    def testtest():
        return 1
    ```
    
    ```bash
    $ ./bin/pyspark --master yarn --deploy-mode client --py-files /home/spark/tmp.py
    ```
    
    ```python
    >>> def test():
    ...     import tmp
    ...     return tmp.testtest()
    ...
    >>> spark.range(1).rdd.map(lambda _: test()).collect()  # executor side
    [1]
    >>> test()  # driver side
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 2, in test
    ImportError: No module named tmp
    ```
    
    ### How it happened?
    
    Unlike Yarn cluster and client mode with Spark submit, when Yarn client mode with PySpark shell specifically,
    
    1. It first runs Python shell via:
    
    https://github.com/apache/spark/blob/3cb82047f2f51af553df09b9323796af507d36f8/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L158 as pointed out by @tgravescs in the JIRA.
    
    2. this triggers shell.py and submit another application to launch a py4j gateway:
    
    https://github.com/apache/spark/blob/209b9361ac8a4410ff797cff1115e1888e2f7e66/python/pyspark/java_gateway.py#L45-L60
    
    3. it runs a Py4J gateway:
    
    https://github.com/apache/spark/blob/3cb82047f2f51af553df09b9323796af507d36f8/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L425
    
    4. it copies --py-files  into local temp directory:
    
    https://github.com/apache/spark/blob/3cb82047f2f51af553df09b9323796af507d36f8/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L365-L376
    
    and then these directories are set up to `spark.submit.pyFiles`
    
    5. Py4J JVM is launched and then the Python paths are set via:
    
    https://github.com/apache/spark/blob/7013eea11cb32b1e0038dc751c485da5c94a484b/python/pyspark/context.py#L209-L216
    
    However, these are not actually set because those files were copied into a tmp directory in 4. whereas this code path looks for `SparkFiles.getRootDirectory` where the files are stored only when `SparkContext.addFile()` is called.
    
    In other cluster mode, `spark.files` are set via:
    
    https://github.com/apache/spark/blob/3cb82047f2f51af553df09b9323796af507d36f8/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L554-L555
    
    and those files are explicitly added via:
    
    https://github.com/apache/spark/blob/ecb8b383af1cf1b67f3111c148229e00c9c17c40/core/src/main/scala/org/apache/spark/SparkContext.scala#L395
    
    So we are fine in other modes.
    
    In case of Yarn client and submit with _submit_, these are manually being handled. In particular https://github.com/apache/spark/pull/6360 added most of the logics. In this case, the Python path looks manually set via, for example, `deploy.PythonRunner`. We don't use `spark.files` here. 
    
    ### How does the PR fix the problem?
    
    I tried to make an isolated approach as possible as I can: simply copy py file or zip files into `SparkFiles.getRootDirectory()` in driver side if not existing. Another possible way is to set `spark.files` but it does unnecessary stuff together and sounds a bit invasive.
    
    ### Before
    
    ```python
    >>> def test():
    ...     import tmp
    ...     return tmp.testtest()
    ...
    >>> spark.range(1).rdd.map(lambda _: test()).collect()
    [1]
    >>> test()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 2, in test
    ImportError: No module named tmp
    ```
    
    ### After
    
    ```python
    >>> def test():
    ...     import tmp
    ...     return tmp.testtest()
    ...
    >>> spark.range(1).rdd.map(lambda _: test()).collect()
    
    [1]
    >>> test()
    1
    ```
    
    ## How was this patch tested?
    
    I manually tested in standalone and yarn cluster with PySpark shell. .zip and .py files were also tested with the similar steps above. It's difficult to add a test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-21945

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21267.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21267
    
----
commit 68be3baef22d8b7aa58a432cb5bd12437c07feb7
Author: hyukjinkwon <gu...@...>
Date:   2018-05-08T07:36:31Z

    Make --py-files work in PySpark shell in Yarn client mode

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90616/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Does it only happen in yarn client PySpark shell? I would suggest to fix this in the SparkSubmit side, to treat this as a special case and set the proper config. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    (I have tried to explain why it's specific to PySpark shell with Yarn client mode in PR description)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3213/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21267#discussion_r187274278
  
    --- Diff: python/pyspark/context.py ---
    @@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
             for path in self._conf.get("spark.submit.pyFiles", "").split(","):
                 if path != "":
                     (dirname, filename) = os.path.split(path)
    -                if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
    -                    self._python_includes.append(filename)
    -                    sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
    +                try:
    +                    filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
    +                    if not os.path.exists(filepath):
    +                        # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
    +                        # not added via SparkContext.addFile. Here we check if the file exists,
    +                        # try to copy and then add it to the path. See SPARK-21945.
    +                        shutil.copyfile(path, filepath)
    --- End diff --
    
    Are 'spark.submit.pyFiles' files only missing on driver side? I mean, if they are not added by `SparkContext.addFile`, shouldn't they also be missing on executors?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21267#discussion_r186650331
  
    --- Diff: python/pyspark/context.py ---
    @@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
             for path in self._conf.get("spark.submit.pyFiles", "").split(","):
                 if path != "":
                     (dirname, filename) = os.path.split(path)
    -                if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
    -                    self._python_includes.append(filename)
    -                    sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
    +                try:
    +                    filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
    +                    if not os.path.exists(filepath):
    +                        # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
    +                        # not added via SparkContext.addFile. Here we check if the file exists,
    +                        # try to copy and then add it to the path. See SPARK-21945.
    +                        shutil.copyfile(path, filepath)
    +                    if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
    +                        self._python_includes.append(filename)
    +                        sys.path.insert(1, filepath)
    +                except Exception as e:
    +                    from pyspark import util
    +                    warnings.warn(
    --- End diff --
    
    Log was also tested manually:
    
    ```
    .../python/pyspark/context.py:230: RuntimeWarning: Python file [/home/spark/tmp.py] specified in 'spark.submit.pyFiles' failed to be added in the Python path, excluding this in the Python path.
      : ...
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3214/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    **[Test build #90614 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90614/testReport)** for PR 21267 at commit [`b9e312e`](https://github.com/apache/spark/commit/b9e312ecfd0215c669e1826e891ccbaa5937ea49).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    **[Test build #90364 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90364/testReport)** for PR 21267 at commit [`68be3ba`](https://github.com/apache/spark/commit/68be3baef22d8b7aa58a432cb5bd12437c07feb7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21267#discussion_r187216038
  
    --- Diff: python/pyspark/context.py ---
    @@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
             for path in self._conf.get("spark.submit.pyFiles", "").split(","):
                 if path != "":
                     (dirname, filename) = os.path.split(path)
    -                if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
    -                    self._python_includes.append(filename)
    -                    sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
    +                try:
    +                    filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
    +                    if not os.path.exists(filepath):
    +                        # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
    +                        # not added via SparkContext.addFile. Here we check if the file exists,
    +                        # try to copy and then add it to the path. See SPARK-21945.
    +                        shutil.copyfile(path, filepath)
    --- End diff --
    
    that's the initial approach I tried. thing is, .py file in the configuration. it needs its parent directory (not .py file itself) and it would add other .py files too if there are in the directort.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90704/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90565/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Hm .. @jerryshao, seems it's a bit difficult to do so. The simplest way should be just to copy files into the directories in `SparkFiles.getRootDirectory`; however, `SparkEnv` is inaccessible at this stage in `SparkSubmit` ..
    
    Another way might be to find if there's a way by setting `spark.files` so that they are added via `addFile` later which put the file in `SparkFiles.getRootDirectory` at driver side too but .. I wonder if it makes sense to set this which Yarn doesn't use.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    **[Test build #90616 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90616/testReport)** for PR 21267 at commit [`ef3555e`](https://github.com/apache/spark/commit/ef3555e389ea36159e9a1dfd076e9f6afbaf3f35).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21267#discussion_r187133079
  
    --- Diff: python/pyspark/context.py ---
    @@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
             for path in self._conf.get("spark.submit.pyFiles", "").split(","):
                 if path != "":
                     (dirname, filename) = os.path.split(path)
    -                if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
    -                    self._python_includes.append(filename)
    -                    sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
    +                try:
    +                    filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
    +                    if not os.path.exists(filepath):
    +                        # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
    +                        # not added via SparkContext.addFile. Here we check if the file exists,
    +                        # try to copy and then add it to the path. See SPARK-21945.
    +                        shutil.copyfile(path, filepath)
    --- End diff --
    
    Is this copy necessary? Couldn't you just add `path` to `sys.path` (instead of adding `filepath`) and that would solve the problem?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21267


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Yea, this is specific to yarn client PySpark shell. In case of yarn client and cluster with submit, they are specially handled via #6360 but I think PySpark shell in yarn client mode was missed out. The way of launching it is diverted if I understood correctly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    **[Test build #90565 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90565/testReport)** for PR 21267 at commit [`68be3ba`](https://github.com/apache/spark/commit/68be3baef22d8b7aa58a432cb5bd12437c07feb7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3184/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    **[Test build #90565 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90565/testReport)** for PR 21267 at commit [`68be3ba`](https://github.com/apache/spark/commit/68be3baef22d8b7aa58a432cb5bd12437c07feb7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    **[Test build #90704 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90704/testReport)** for PR 21267 at commit [`ef3555e`](https://github.com/apache/spark/commit/ef3555e389ea36159e9a1dfd076e9f6afbaf3f35).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21267#discussion_r187274825
  
    --- Diff: python/pyspark/context.py ---
    @@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
             for path in self._conf.get("spark.submit.pyFiles", "").split(","):
                 if path != "":
                     (dirname, filename) = os.path.split(path)
    -                if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
    -                    self._python_includes.append(filename)
    -                    sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
    +                try:
    +                    filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
    +                    if not os.path.exists(filepath):
    +                        # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
    +                        # not added via SparkContext.addFile. Here we check if the file exists,
    +                        # try to copy and then add it to the path. See SPARK-21945.
    +                        shutil.copyfile(path, filepath)
    --- End diff --
    
    Yup, that's only missing on driver side in this mode specifically. Yarn doesn't add it since `spark.files` is not set if I understood correctly. They are specially handled in case of submit but shell case seems missing. 
    
    I described a bit in the PR description too.
    
    > In case of Yarn client and cluster with submit, these are manually being handled. In particular #6360 added most of the logics. In this case, the Python path looks manually set via, for example, deploy.PythonRunner. We don't use spark.files here.
    
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    **[Test build #90614 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90614/testReport)** for PR 21267 at commit [`b9e312e`](https://github.com/apache/spark/commit/b9e312ecfd0215c669e1826e891ccbaa5937ea49).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90614/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    **[Test build #90616 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90616/testReport)** for PR 21267 at commit [`ef3555e`](https://github.com/apache/spark/commit/ef3555e389ea36159e9a1dfd076e9f6afbaf3f35).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21267#discussion_r186670486
  
    --- Diff: python/pyspark/context.py ---
    @@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
             for path in self._conf.get("spark.submit.pyFiles", "").split(","):
                 if path != "":
                     (dirname, filename) = os.path.split(path)
    -                if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
    -                    self._python_includes.append(filename)
    -                    sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
    +                try:
    +                    filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
    +                    if not os.path.exists(filepath):
    +                        # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
    +                        # not added via SparkContext.addFile. Here we check if the file exists,
    +                        # try to copy and then add it to the path. See SPARK-21945.
    +                        shutil.copyfile(path, filepath)
    +                    if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
    --- End diff --
    
    Am I missing anything? Looks like `PACKAGE_EXTENSIONS = ('.zip', '.egg', '.jar')`. So `.py` seems not in that?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work in PySp...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    cc @vanzin, @jerryshao and @tgravescs, could you take a look and see if it makes sense please?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21267#discussion_r187259682
  
    --- Diff: python/pyspark/context.py ---
    @@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
             for path in self._conf.get("spark.submit.pyFiles", "").split(","):
                 if path != "":
                     (dirname, filename) = os.path.split(path)
    -                if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
    -                    self._python_includes.append(filename)
    -                    sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
    +                try:
    +                    filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
    +                    if not os.path.exists(filepath):
    +                        # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
    +                        # not added via SparkContext.addFile. Here we check if the file exists,
    +                        # try to copy and then add it to the path. See SPARK-21945.
    +                        shutil.copyfile(path, filepath)
    +                    if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
    --- End diff --
    
    Oh, I see.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work in PySp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    **[Test build #90364 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90364/testReport)** for PR 21267 at commit [`68be3ba`](https://github.com/apache/spark/commit/68be3baef22d8b7aa58a432cb5bd12437c07feb7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90364/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3277/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21267#discussion_r187264822
  
    --- Diff: python/pyspark/context.py ---
    @@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
             for path in self._conf.get("spark.submit.pyFiles", "").split(","):
                 if path != "":
                     (dirname, filename) = os.path.split(path)
    -                if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
    -                    self._python_includes.append(filename)
    -                    sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
    +                try:
    +                    filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
    +                    if not os.path.exists(filepath):
    +                        # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
    +                        # not added via SparkContext.addFile. Here we check if the file exists,
    +                        # try to copy and then add it to the path. See SPARK-21945.
    +                        shutil.copyfile(path, filepath)
    --- End diff --
    
    I don't think so but that's already being done in other cluster / client modes. The copies are made via addFile in other modes but it's not being copied in this case specifically. I think we should better consistently copy.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    **[Test build #90704 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90704/testReport)** for PR 21267 at commit [`ef3555e`](https://github.com/apache/spark/commit/ef3555e389ea36159e9a1dfd076e9f6afbaf3f35).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21267#discussion_r188144573
  
    --- Diff: python/pyspark/context.py ---
    @@ -211,9 +211,22 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
             for path in self._conf.get("spark.submit.pyFiles", "").split(","):
                 if path != "":
                     (dirname, filename) = os.path.split(path)
    -                if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
    -                    self._python_includes.append(filename)
    -                    sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
    +                try:
    +                    filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
    +                    if not os.path.exists(filepath):
    +                        # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
    +                        # not added via SparkContext.addFile. Here we check if the file exists,
    +                        # try to copy and then add it to the path. See SPARK-21945.
    +                        shutil.copyfile(path, filepath)
    +                    if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
    +                        self._python_includes.append(filename)
    +                        sys.path.insert(1, filepath)
    +                except Exception:
    +                    from pyspark import util
    +                    warnings.warn(
    --- End diff --
    
    Likewise, I checked the warning manually:
    
    ```
    .../pyspark/context.py:229: RuntimeWarning: Failed to add file [/home/spark/tmp.py] speficied in 'spark.submit.pyFiles' to Python path:
    
    ...
      /usr/lib64/python27.zip
      /usr/lib64/python2.7
    ... 
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Looks good aside from the log message.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21267#discussion_r187259493
  
    --- Diff: python/pyspark/context.py ---
    @@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
             for path in self._conf.get("spark.submit.pyFiles", "").split(","):
                 if path != "":
                     (dirname, filename) = os.path.split(path)
    -                if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
    -                    self._python_includes.append(filename)
    -                    sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
    +                try:
    +                    filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
    +                    if not os.path.exists(filepath):
    +                        # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
    +                        # not added via SparkContext.addFile. Here we check if the file exists,
    +                        # try to copy and then add it to the path. See SPARK-21945.
    +                        shutil.copyfile(path, filepath)
    --- End diff --
    
    For file types in `PACKAGE_EXTENSIONS`, do we need to copy?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3036/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Will try to put this into SparkSubmit.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21267#discussion_r186673789
  
    --- Diff: python/pyspark/context.py ---
    @@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
             for path in self._conf.get("spark.submit.pyFiles", "").split(","):
                 if path != "":
                     (dirname, filename) = os.path.split(path)
    -                if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
    -                    self._python_includes.append(filename)
    -                    sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
    +                try:
    +                    filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
    +                    if not os.path.exists(filepath):
    +                        # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
    +                        # not added via SparkContext.addFile. Here we check if the file exists,
    +                        # try to copy and then add it to the path. See SPARK-21945.
    +                        shutil.copyfile(path, filepath)
    +                    if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
    --- End diff --
    
    the root is added into the path above. .py file needs its parent directory .. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21267#discussion_r186920316
  
    --- Diff: python/pyspark/context.py ---
    @@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
             for path in self._conf.get("spark.submit.pyFiles", "").split(","):
                 if path != "":
                     (dirname, filename) = os.path.split(path)
    -                if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
    -                    self._python_includes.append(filename)
    -                    sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
    +                try:
    +                    filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
    +                    if not os.path.exists(filepath):
    +                        # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
    +                        # not added via SparkContext.addFile. Here we check if the file exists,
    +                        # try to copy and then add it to the path. See SPARK-21945.
    +                        shutil.copyfile(path, filepath)
    +                    if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
    +                        self._python_includes.append(filename)
    +                        sys.path.insert(1, filepath)
    +                except Exception as e:
    +                    from pyspark import util
    +                    warnings.warn(
    --- End diff --
    
    BTW, this should now be safer in any case since we now don't put non-existent files and print out warnings.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21267
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21267#discussion_r188037259
  
    --- Diff: python/pyspark/context.py ---
    @@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
             for path in self._conf.get("spark.submit.pyFiles", "").split(","):
                 if path != "":
                     (dirname, filename) = os.path.split(path)
    -                if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
    -                    self._python_includes.append(filename)
    -                    sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
    +                try:
    +                    filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
    +                    if not os.path.exists(filepath):
    +                        # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
    +                        # not added via SparkContext.addFile. Here we check if the file exists,
    +                        # try to copy and then add it to the path. See SPARK-21945.
    +                        shutil.copyfile(path, filepath)
    +                    if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
    +                        self._python_includes.append(filename)
    +                        sys.path.insert(1, filepath)
    +                except Exception as e:
    +                    from pyspark import util
    +                    warnings.warn(
    +                        "Python file [%s] specified in 'spark.submit.pyFiles' failed "
    --- End diff --
    
    Simplify this message?
    
    "Failed to add file [%s] speficied in 'spark.submit.pyFiles' to Python path:\n  %s"


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org