You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mengxr <gi...@git.apache.org> on 2014/05/21 20:58:37 UTC

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

GitHub user mengxr opened a pull request:

    https://github.com/apache/spark/pull/848

    [SPARK-1870] Make spark-submit --jars work in yarn-cluster mode.

    Sent secondary jars to distributed cache of all containers and add the cached jars to classpath before executors start.
    
    `spark-submit --jars` also works in standalone server and `yarn-client`. Thanks for @andrewor14 for testing!
    
    I removed "Doesn't work for drivers in standalone mode with "cluster" deploy mode." from `spark-submit`'s help message, though we haven't tested mesos yet.
    
    CC: @dbtsai @sryza

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mengxr/spark yarn-classpath

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/848.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #848
    
----
commit dc3c825934cbd62566d09d3f2b4334dcc444879a
Author: Xiangrui Meng <me...@databricks.com>
Date:   2014-05-21T17:51:43Z

    add secondary jars to classpath in yarn

commit 3e7e1c4a2fe1a9d8512c19e56df91b34bea58108
Author: Xiangrui Meng <me...@databricks.com>
Date:   2014-05-21T18:21:09Z

    use sparkConf instead of hadoop conf

commit 11e535434940d0809bd8c1380b2d4a92d87ebb6a
Author: Xiangrui Meng <me...@databricks.com>
Date:   2014-05-21T18:45:25Z

    minor changes

commit 65e04ad8296969445e4ecfaa8921d55fe1e39c74
Author: Xiangrui Meng <me...@databricks.com>
Date:   2014-05-21T18:52:02Z

    update spark-submit help message and add a comment for yarn-client

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/848#issuecomment-43804541
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15125/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/848#issuecomment-43822767
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/848#issuecomment-43812877
  
    Thanks. It looks great for me, and better than my patch.
    
    cachedSecondaryJarLinks.foreach(addPwdClasspathEntry) is not needed since we have 
    addPwdClasspathEntry("*"). But later, we may change the priority of the jars since we explicitly add them.
    
    This patch also works for me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/848#issuecomment-43825518
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/848#issuecomment-43827704
  
    On standalone mode and Mesos, does this fix require the JARs to be accessible from the same URL on all nodes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/848#discussion_r12921709
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
    @@ -479,37 +485,24 @@ object ClientBase {
     
         extraClassPath.foreach(addClasspathEntry)
     
    -    addClasspathEntry(Environment.PWD.$())
    +    val cachedSecondaryJarLinks =
    +      sparkConf.getOption(CONF_SPARK_YARN_SECONDARY_JARS).getOrElse("").split(",")
         // Normally the users app.jar is last in case conflicts with spark jars
         if (sparkConf.get("spark.yarn.user.classpath.first", "false").toBoolean) {
    --- End diff --
    
    PS, in line 47,   * 1. In standalone mode, it will launch an [[org.apache.spark.deploy.yarn.ApplicationMaster]]
    should it be cluster mode now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/848#issuecomment-43832565
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15133/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/848#issuecomment-43825524
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/848#issuecomment-43814337
  
    The symbolic links may not be under the PWD. That is why it didn't work before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/848#discussion_r12921552
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
    @@ -479,37 +485,24 @@ object ClientBase {
     
         extraClassPath.foreach(addClasspathEntry)
     
    -    addClasspathEntry(Environment.PWD.$())
    +    val cachedSecondaryJarLinks =
    +      sparkConf.getOption(CONF_SPARK_YARN_SECONDARY_JARS).getOrElse("").split(",")
         // Normally the users app.jar is last in case conflicts with spark jars
         if (sparkConf.get("spark.yarn.user.classpath.first", "false").toBoolean) {
    --- End diff --
    
    What's difference between `spark.yarn.user.classpath.first` and `spark.files.userClassPathFirst `? For me, it seems to be the same thing with two different configuration. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/848#issuecomment-43815204
  
    Yes, we can also control the ordering in this way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/848#issuecomment-43816549
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/848#discussion_r12923791
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
    @@ -479,37 +485,24 @@ object ClientBase {
     
         extraClassPath.foreach(addClasspathEntry)
     
    -    addClasspathEntry(Environment.PWD.$())
    +    val cachedSecondaryJarLinks =
    +      sparkConf.getOption(CONF_SPARK_YARN_SECONDARY_JARS).getOrElse("").split(",")
         // Normally the users app.jar is last in case conflicts with spark jars
         if (sparkConf.get("spark.yarn.user.classpath.first", "false").toBoolean) {
    --- End diff --
    
    `spark.files.userClassPath` is a global configuration that controls the ordering of dynamically added jars, while `spark.yarn.user.classpath.first` is only for YARN. I agree it is a little confusing, but this is independent of this PR. We can create a new JIRA for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/848#discussion_r12977252
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
    @@ -326,8 +326,7 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) {
             |  --class CLASS_NAME          Your application's main class (for Java / Scala apps).
             |  --name NAME                 A name of your application.
             |  --jars JARS                 Comma-separated list of local jars to include on the driver
    -        |                              and executor classpaths. Doesn't work for drivers in
    --- End diff --
    
    This should not have been taken out actually.  It can be put back in. But we found out just now that the "cluster mode" of Spark Standalone cluster is sort of semi-broken with spark submit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/848#discussion_r12923805
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
    @@ -479,37 +485,24 @@ object ClientBase {
     
         extraClassPath.foreach(addClasspathEntry)
     
    -    addClasspathEntry(Environment.PWD.$())
    +    val cachedSecondaryJarLinks =
    +      sparkConf.getOption(CONF_SPARK_YARN_SECONDARY_JARS).getOrElse("").split(",")
         // Normally the users app.jar is last in case conflicts with spark jars
         if (sparkConf.get("spark.yarn.user.classpath.first", "false").toBoolean) {
    --- End diff --
    
    I will update the doc. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/848#issuecomment-43862984
  
    I independently tested this on Yarn 2.4 running in a VM where I could reproduce the problem. This change indeed allows Jars loaded with --jars to be accessible in executors. I am going to merge this. Thanks @mengxr for fixing this, and @dbtsai , @sryza and @dbtsai for helping out along the way!
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/848#issuecomment-43800111
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on a diff in the pull request:

    https://github.com/apache/spark/pull/848#discussion_r12975962
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
    @@ -326,8 +326,7 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) {
             |  --class CLASS_NAME          Your application's main class (for Java / Scala apps).
             |  --name NAME                 A name of your application.
             |  --jars JARS                 Comma-separated list of local jars to include on the driver
    -        |                              and executor classpaths. Doesn't work for drivers in
    --- End diff --
    
    Was there a reason for taking this out? My impression is that this still won't work on standalone with cluster deploy mode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/848#issuecomment-43822769
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15128/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/848#issuecomment-43804540
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/848#issuecomment-43800093
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/848#issuecomment-43832563
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/848#issuecomment-43816325
  
    @dbtsai Could you backport the patch to branch-0.9 and test it on your cluster?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/848#issuecomment-43816530
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/848


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/848#issuecomment-43814642
  
    It works under driver before, so the major issue is those files are not in executor's distributed cache. But I like the idea to add them explicitly so we'll not miss anything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/848#issuecomment-43836661
  
    This doesn't apply to standalone or Mesos. For these two modes, Spark submit translates `--jars` to `spark.jars`, then SparkContext uploads these jars to the HTTP server, and the executors pull from the server.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---