You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by shanyu <gi...@git.apache.org> on 2018/11/12 03:11:53 UTC

[GitHub] spark pull request #23009: SPARK-26011: pyspark app with "spark.jars.package...

GitHub user shanyu opened a pull request:

    https://github.com/apache/spark/pull/23009

    SPARK-26011: pyspark app with "spark.jars.packages" config does not work

    SparkSubmit determines pyspark app by the suffix of primary resource but Livy
    uses "spark-internal" as the primary resource when calling spark-submit,
    therefore args.isPython is set to false in SparkSubmit.scala.
    
    The fix is to resolve maven coordinates not only when args.isPython is true,
    but also when primary resource is spark-internal.
    
    Tested the patch with Livy submitting pyspark app, spark-submit, pyspark with or without packages config.
    
    Signed-off-by: Shanyu Zhao <sh...@microsoft.com>

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shanyu/spark shanyu-26011

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23009.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23009
    
----
commit c8424aff80e33f9a3f5a7d19a04442c7dac701a4
Author: Shanyu Zhao <sh...@...>
Date:   2018-11-12T02:57:01Z

    SPARK-26011: pyspark app with "spark.jars.packages" config does not work
    
    SparkSubmit determines pyspark app by the suffix of primary resource but Livy
    uses "spark-internal" as the primary resource when calling spark-submit,
    therefore args.isPython is set to false in SparkSubmit.scala.
    
    The fix is to resolve maven coordinates not only when args.isPython is true,
    but also when primary resource is spark-internal.
    
    Signed-off-by: Shanyu Zhao <sh...@microsoft.com>

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23009: SPARK-26011: pyspark app with "spark.jars.packages" conf...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/23009
  
    @srowen , @drdarshan mentioned that it may be better to fix livy instead of spark.  The problem is that when using the %%configure magic spark.jars.packages does not add the python files in the jar to the python path.  Any ideas who might be more familiar with this code and might know whether this is a problem in livy or spark?  It looks like this fixes the issue, but I'm not sure if this is the correct fix.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23009: SPARK-26011: pyspark app with "spark.jars.packages" conf...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/23009
  
    LGTM, nice find


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23009: [SPARK-26011][SPARK-SUBMIT] Yarn mode pyspark app withou...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/23009
  
    Merged to master/2.4/2.3


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23009: [SPARK-26011][SPARK-SUBMIT] Yarn mode pyspark/R app with...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/23009
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23009: [SPARK-26011][SPARK-SUBMIT] Yarn mode pyspark app withou...

Posted by shanyu <gi...@git.apache.org>.
Github user shanyu commented on the issue:

    https://github.com/apache/spark/pull/23009
  
    thanks @vanzin! Yes you are right this only affects Yarn mode, and duplicates should be taken cared by Yarn. Title and description updated as suggested by you.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23009: SPARK-26011: pyspark app with "spark.jars.packages" conf...

Posted by shanyu <gi...@git.apache.org>.
Github user shanyu commented on the issue:

    https://github.com/apache/spark/pull/23009
  
    @imatiach-msft I think this is a Spark bug not Livy's. Livy just calls SparkSubmit with "spark-internal" type of resource, with "spark.jars.pacakges" setting. SparkSubmit module is the one who resolves the packages and add to "spark.submit.pyFiles" so that python interpreter can be correctly setup.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23009: [SPARK-26011][SPARK-SUBMIT] Yarn mode pyspark app withou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23009
  
    **[Test build #4425 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4425/testReport)** for PR 23009 at commit [`c8424af`](https://github.com/apache/spark/commit/c8424aff80e33f9a3f5a7d19a04442c7dac701a4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23009: [SPARK-26011][SPARK-SUBMIT] Yarn mode pyspark/R app with...

Posted by inchiosa <gi...@git.apache.org>.
Github user inchiosa commented on the issue:

    https://github.com/apache/spark/pull/23009
  
    Might this fix impact the resource usage and performance of Scala notebooks run through Livy by unnecessarily setting spark.submit.pyFiles?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23009: SPARK-26011: pyspark app with "spark.jars.packages" conf...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23009
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23009: SPARK-26011: pyspark app with "spark.jars.packages" conf...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23009
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23009: SPARK-26011: pyspark app with "spark.jars.packages" conf...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/23009
  
    @shanyu can you update the name as [SPARK-26011][CORE][PYSPARK] according to the guidelines?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23009: [SPARK-26011][SPARK-SUBMIT] Yarn mode pyspark app withou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23009
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23009: [SPARK-26011][SPARK-SUBMIT] Yarn mode pyspark app withou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23009
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98801/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23009: SPARK-26011: pyspark app with "spark.jars.package...

Posted by shanyu <gi...@git.apache.org>.
Github user shanyu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23009#discussion_r232868311
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -318,7 +318,7 @@ private[spark] class SparkSubmit extends Logging {
     
           if (!StringUtils.isBlank(resolvedMavenCoordinates)) {
             args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)
    -        if (args.isPython) {
    +        if (args.isPython || isInternal(args.primaryResource)) {
    --- End diff --
    
    The code snippet in the if() statement is to properly set "spark.submit.pyFiles" config, but it only get executed when args.isPython. SparkSubmit uses the primary resource's suffix to determine args.isPython. For NO_RESOURCE type ("spark-internal" as primary resource), we should also do this because it might use python in the NO_RESOURCE scenario.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23009: SPARK-26011: pyspark app with "spark.jars.package...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23009#discussion_r232706315
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -318,7 +318,7 @@ private[spark] class SparkSubmit extends Logging {
     
           if (!StringUtils.isBlank(resolvedMavenCoordinates)) {
             args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)
    -        if (args.isPython) {
    +        if (args.isPython || isInternal(args.primaryResource)) {
    --- End diff --
    
    These will be dumb questions since I don't know this code, but is this specific just to Python? This seems to set pyFiles even when not running a Python job or am I missing something?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23009: SPARK-26011: pyspark app with "spark.jars.packages" conf...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/23009
  
    Jenkins test this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23009: [SPARK-26011][SPARK-SUBMIT] Yarn mode pyspark app...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/23009


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23009: [SPARK-26011][SPARK-SUBMIT] Yarn mode pyspark app withou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23009
  
    **[Test build #98801 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98801/testReport)** for PR 23009 at commit [`c8424af`](https://github.com/apache/spark/commit/c8424aff80e33f9a3f5a7d19a04442c7dac701a4).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23009: [SPARK-26011][SPARK-SUBMIT] Yarn mode pyspark/R app with...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/23009
  
    Setting that property should not affect Scala code in any way.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23009: [SPARK-26011][SPARK-SUBMIT] Yarn mode pyspark app withou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23009
  
    **[Test build #4425 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4425/testReport)** for PR 23009 at commit [`c8424af`](https://github.com/apache/spark/commit/c8424aff80e33f9a3f5a7d19a04442c7dac701a4).
     * This patch passes all tests.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23009: [SPARK-26011][SPARK-SUBMIT] Yarn mode pyspark/R app with...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23009
  
    **[Test build #98801 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98801/testReport)** for PR 23009 at commit [`c8424af`](https://github.com/apache/spark/commit/c8424aff80e33f9a3f5a7d19a04442c7dac701a4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23009: SPARK-26011: pyspark app with "spark.jars.packages" conf...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23009
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23009: SPARK-26011: pyspark app with "spark.jars.package...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23009#discussion_r232921575
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -318,7 +318,7 @@ private[spark] class SparkSubmit extends Logging {
     
           if (!StringUtils.isBlank(resolvedMavenCoordinates)) {
             args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)
    -        if (args.isPython) {
    +        if (args.isPython || isInternal(args.primaryResource)) {
    --- End diff --
    
    Yeah I get what the code does, was just wondering why it always sets a pyfiles now even when it's not a pyspark app. But the answer is that pyspark apps also need resolved Maven dependencies, I believe. @vanzin does this look right?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org