You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jerryshao <gi...@git.apache.org> on 2018/05/24 07:07:46 UTC

[GitHub] spark pull request #21420: [SPARK-24377][Spark Submit] make --py-files work ...

GitHub user jerryshao opened a pull request:

    https://github.com/apache/spark/pull/21420

    [SPARK-24377][Spark Submit] make --py-files work in non pyspark application

    ## What changes were proposed in this pull request?
    
    For some Spark applications, though they're a java program, they require not only jar dependencies, but also python dependencies. One example is Livy remote SparkContext application, this application is actually an embedded REPL for Scala/Python/R, it will not only load in jar dependencies, but also python and R deps, so we should specify not only "--jars", but also "--py-files".
    
    Currently for a Spark application, --py-files can only be worked for a pyspark application, so it will not be worked in the above case. So here propose to remove such restriction.
    
    Also we tested that "spark.submit.pyFiles" only supports quite limited scenario (client mode with local deps), so here also expand the usage of "spark.submit.pyFiles" to be alternative of --py-files.
    
    ## How was this patch tested?
    
    UT added.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jerryshao/apache-spark SPARK-24377

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21420.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21420
    
----
commit a41c99bf311aa8f4e0c2e07c1288f5a11e057ea4
Author: jerryshao <ss...@...>
Date:   2018-05-24T06:53:23Z

    make --py-files work in non pyspark application

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/21420
  
    Doesn't `--files` work for adding such dependencies?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21420
  
    **[Test build #91207 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91207/testReport)** for PR 21420 at commit [`e66ea49`](https://github.com/apache/spark/commit/e66ea49000860d593074296b2a86e8bbdf5f0261).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21420
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3624/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/21420
  
    Thanks @HyukjinKwon !


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21420
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3553/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21420
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21420
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3541/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21420: [SPARK-24377][Spark Submit] make --py-files work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21420#discussion_r190783213
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -430,18 +430,15 @@ private[spark] class SparkSubmit extends Logging {
             // Usage: PythonAppRunner <main python file> <extra python files> [app arguments]
             args.mainClass = "org.apache.spark.deploy.PythonRunner"
             args.childArgs = ArrayBuffer(localPrimaryResource, localPyFiles) ++ args.childArgs
    -        if (clusterManager != YARN) {
    -          // The YARN backend distributes the primary file differently, so don't merge it.
    -          args.files = mergeFileLists(args.files, args.primaryResource)
    --- End diff --
    
    Eh @jerryshao why did we remove this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21420
  
    **[Test build #91106 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91106/testReport)** for PR 21420 at commit [`c8521cc`](https://github.com/apache/spark/commit/c8521cc0de9de2e113a72e8379272b6fd009279a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21420
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21420: [SPARK-24377][Spark Submit] make --py-files work ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21420


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21420
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21420
  
    **[Test build #91106 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91106/testReport)** for PR 21420 at commit [`c8521cc`](https://github.com/apache/spark/commit/c8521cc0de9de2e113a72e8379272b6fd009279a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21420
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91207/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21420: [SPARK-24377][Spark Submit] make --py-files work ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21420#discussion_r190783462
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -430,18 +430,15 @@ private[spark] class SparkSubmit extends Logging {
             // Usage: PythonAppRunner <main python file> <extra python files> [app arguments]
             args.mainClass = "org.apache.spark.deploy.PythonRunner"
             args.childArgs = ArrayBuffer(localPrimaryResource, localPyFiles) ++ args.childArgs
    -        if (clusterManager != YARN) {
    -          // The YARN backend distributes the primary file differently, so don't merge it.
    -          args.files = mergeFileLists(args.files, args.primaryResource)
    --- End diff --
    
    it is duplicated with below code, you can check the original code.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21420
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21420
  
    **[Test build #91090 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91090/testReport)** for PR 21420 at commit [`a41c99b`](https://github.com/apache/spark/commit/a41c99bf311aa8f4e0c2e07c1288f5a11e057ea4).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21420
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91090/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21420: [SPARK-24377][Spark Submit] make --py-files work ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21420#discussion_r191011981
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
    @@ -1093,6 +1097,44 @@ class SparkSubmitSuite
         assert(exception.getMessage() === "hello")
       }
     
    +  test("support --py-files/spark.submit.pyFiles in non pyspark application") {
    +    val hadoopConf = new Configuration()
    +    updateConfWithFakeS3Fs(hadoopConf)
    +
    +    val tmpDir = Utils.createTempDir()
    +    val pyFile = File.createTempFile("tmpPy", ".egg", tmpDir)
    +
    +    val args = Seq(
    +      "--class", UserClasspathFirstTest.getClass.getName.stripPrefix("$"),
    +      "--name", "testApp",
    +      "--master", "yarn",
    +      "--deploy-mode", "client",
    +      "--py-files", s"s3a://${pyFile.getAbsolutePath}",
    +      "spark-internal"
    +    )
    +
    +    val appArgs = new SparkSubmitArguments(args)
    +    val (_, _, conf, _) = submit.prepareSubmitEnvironment(appArgs, conf = Some(hadoopConf))
    +
    +    conf.get("spark.yarn.dist.pyFiles") should be (s"s3a://${pyFile.getAbsolutePath}")
    +    conf.get("spark.submit.pyFiles") should (startWith("/"))
    +
    +    // Verify "spark.submit.pyFiles"
    +    val args1 = Seq(
    +      "--class", UserClasspathFirstTest.getClass.getName.stripPrefix("$"),
    +      "--name", "testApp",
    +      "--master", "yarn",
    +      "--deploy-mode", "client",
    +      "--conf", s"spark.submit.pyFiles=s3a://${pyFile.getAbsolutePath}",
    +      "spark-internal"
    +    )
    +
    +    val appArgs1 = new SparkSubmitArguments(args1)
    +    val (_, _, conf1, _) = submit.prepareSubmitEnvironment(appArgs1, conf = Some(hadoopConf))
    +
    +    conf1.get("spark.yarn.dist.pyFiles") should be (s"s3a://${pyFile.getAbsolutePath}")
    --- End diff --
    
    use `PY_FILES.key`, also in other places.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/21420
  
    `--files` can be used, but user (Livy in our case) should differentiate whether the added files are python dependency or just plaintext.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21420
  
    **[Test build #91090 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91090/testReport)** for PR 21420 at commit [`a41c99b`](https://github.com/apache/spark/commit/a41c99bf311aa8f4e0c2e07c1288f5a11e057ea4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21420
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91106/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21420
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/21420
  
    CC @HyukjinKwon @vanzin please help to review, thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21420: [SPARK-24377][Spark Submit] make --py-files work ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21420#discussion_r191011438
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -430,18 +430,15 @@ private[spark] class SparkSubmit extends Logging {
             // Usage: PythonAppRunner <main python file> <extra python files> [app arguments]
             args.mainClass = "org.apache.spark.deploy.PythonRunner"
             args.childArgs = ArrayBuffer(localPrimaryResource, localPyFiles) ++ args.childArgs
    -        if (clusterManager != YARN) {
    -          // The YARN backend distributes the primary file differently, so don't merge it.
    -          args.files = mergeFileLists(args.files, args.primaryResource)
    -        }
           }
           if (clusterManager != YARN) {
             // The YARN backend handles python files differently, so don't merge the lists.
             args.files = mergeFileLists(args.files, args.pyFiles)
           }
    -      if (localPyFiles != null) {
    +    }
    +
    +    if (localPyFiles != null) {
             sparkConf.set("spark.submit.pyFiles", localPyFiles)
    --- End diff --
    
    Looks indented too far now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21420
  
    **[Test build #91207 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91207/testReport)** for PR 21420 at commit [`e66ea49`](https://github.com/apache/spark/commit/e66ea49000860d593074296b2a86e8bbdf5f0261).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21420
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21420
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org