You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2020/04/01 01:09:50 UTC

[spark] branch master updated: [SPARK-31308][PYSPARK] Merging pyFiles to files argument for Non-PySpark applications

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 20fc6fa  [SPARK-31308][PYSPARK] Merging pyFiles to files argument for Non-PySpark applications
20fc6fa is described below

commit 20fc6fa8398b9dc47b9ae7df52133a306f89b25f
Author: Liang-Chi Hsieh <li...@uber.com>
AuthorDate: Tue Mar 31 18:08:55 2020 -0700

    [SPARK-31308][PYSPARK] Merging pyFiles to files argument for Non-PySpark applications
    
    ### What changes were proposed in this pull request?
    
    This PR (SPARK-31308) proposed to add python dependencies even it is not Python applications.
    
    ### Why are the changes needed?
    
    For now, we add `pyFiles` argument to `files` argument only for Python applications, in SparkSubmit. Like the reason in #21420, "for some Spark applications, though they're a java program, they require not only jar dependencies, but also python dependencies.", we need to add `pyFiles` to `files` even it is not Python applications.
    
    ### Does this PR introduce any user-facing change?
    
    Yes. After this change, for non-PySpark applications, the Python files specified by `pyFiles` are also added to `files` like PySpark applications.
    
    ### How was this patch tested?
    
    Manually test on jupyter notebook or do `spark-submit` with `--verbose`.
    
    ```
    Spark config:
    ...
    (spark.files,file:/Users/dongjoon/PRS/SPARK-PR-28077/a.py)
    (spark.submit.deployMode,client)
    (spark.master,local[*])
    ```
    
    Closes #28077 from viirya/pyfile.
    
    Lead-authored-by: Liang-Chi Hsieh <li...@uber.com>
    Co-authored-by: Liang-Chi Hsieh <vi...@gmail.com>
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
 core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
index 4d67dfa..1271a3d 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
@@ -474,10 +474,12 @@ private[spark] class SparkSubmit extends Logging {
         args.mainClass = "org.apache.spark.deploy.PythonRunner"
         args.childArgs = ArrayBuffer(localPrimaryResource, localPyFiles) ++ args.childArgs
       }
-      if (clusterManager != YARN) {
-        // The YARN backend handles python files differently, so don't merge the lists.
-        args.files = mergeFileLists(args.files, args.pyFiles)
-      }
+    }
+
+    // Non-PySpark applications can need Python dependencies.
+    if (deployMode == CLIENT && clusterManager != YARN) {
+      // The YARN backend handles python files differently, so don't merge the lists.
+      args.files = mergeFileLists(args.files, args.pyFiles)
     }
 
     if (localPyFiles != null) {


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org