You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jerryshao <gi...@git.apache.org> on 2016/07/14 07:49:08 UTC

[GitHub] spark pull request #14196: [SPARK-16540][YARN][CORE] Avoid adding jars twice...

GitHub user jerryshao opened a pull request:

    https://github.com/apache/spark/pull/14196

    [SPARK-16540][YARN][CORE] Avoid adding jars twice for Spark running on yarn

    ## What changes were proposed in this pull request?
    
    Currently when running spark on yarn, jars specified with --jars, --packages will be added twice, one is Spark's own file server, another is yarn's distributed cache, this can be seen from log:
    for example:
    
    ```
    ./bin/spark-shell --master yarn-client --jars examples/target/scala-2.11/jars/scopt_2.11-3.3.0.jar
    ```
    
    If specified the jar to be added is scopt jar, it will added twice:
    
    ```
    ...
    16/07/14 15:06:48 INFO Server: Started @5603ms
    16/07/14 15:06:48 INFO Utils: Successfully started service 'SparkUI' on port 4040.
    16/07/14 15:06:48 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.0.102:4040
    16/07/14 15:06:48 INFO SparkContext: Added JAR file:/Users/sshao/projects/apache-spark/examples/target/scala-2.11/jars/scopt_2.11-3.3.0.jar at spark://192.168.0.102:63996/jars/scopt_2.11-3.3.0.jar with timestamp 1468480008637
    16/07/14 15:06:49 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
    16/07/14 15:06:49 INFO Client: Requesting a new application from cluster with 1 NodeManagers
    16/07/14 15:06:49 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
    16/07/14 15:06:49 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
    16/07/14 15:06:49 INFO Client: Setting up container launch context for our AM
    16/07/14 15:06:49 INFO Client: Setting up the launch environment for our AM container
    16/07/14 15:06:49 INFO Client: Preparing resources for our AM container
    16/07/14 15:06:49 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
    16/07/14 15:06:50 INFO Client: Uploading resource file:/private/var/folders/tb/8pw1511s2q78mj7plnq8p9g40000gn/T/spark-a446300b-84bf-43ff-bfb1-3adfb0571a42/__spark_libs__6486179704064718817.zip -> hdfs://localhost:8020/user/sshao/.sparkStaging/application_1468468348998_0009/__spark_libs__6486179704064718817.zip
    16/07/14 15:06:51 INFO Client: Uploading resource file:/Users/sshao/projects/apache-spark/examples/target/scala-2.11/jars/scopt_2.11-3.3.0.jar -> hdfs://localhost:8020/user/sshao/.sparkStaging/application_1468468348998_0009/scopt_2.11-3.3.0.jar
    16/07/14 15:06:51 INFO Client: Uploading resource file:/private/var/folders/tb/8pw1511s2q78mj7plnq8p9g40000gn/T/spark-a446300b-84bf-43ff-bfb1-3adfb0571a42/__spark_conf__326416236462420861.zip -> hdfs://localhost:8020/user/sshao/.sparkStaging/application_1468468348998_0009/__spark_conf__.zip
    ...
    ```
    
    So here try to avoid adding jars to Spark's fileserver unnecessarily. 
    
    ## How was this patch tested?
    
    Manually verified both in yarn client and cluster mode, also in standalone mode.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jerryshao/apache-spark SPARK-16540

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14196.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14196
    
----
commit 86205fcef29515ba72809fc2541e5d6aacfa76a7
Author: jerryshao <ss...@hortonworks.com>
Date:   2016-07-14T06:56:22Z

    Avoid adding jars twice for Spark running on yarn

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14196: [SPARK-16540][YARN][CORE] Avoid adding jars twice for Sp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14196
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14196: [SPARK-16540][YARN][CORE] Avoid adding jars twice...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14196#discussion_r70871820
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2409,9 +2409,9 @@ private[spark] object Utils extends Logging {
        * "spark.yarn.dist.jars" properties, while in other modes it returns the jar files pointed by
        * only the "spark.jars" property.
        */
    -  def getUserJars(conf: SparkConf): Seq[String] = {
    +  def getUserJars(conf: SparkConf, isShell: Boolean = false): Seq[String] = {
    --- End diff --
    
    hm can you document what this parameter does?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14196: [SPARK-16540][YARN][CORE] Avoid adding jars twice...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14196#discussion_r70909029
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2409,9 +2409,9 @@ private[spark] object Utils extends Logging {
        * "spark.yarn.dist.jars" properties, while in other modes it returns the jar files pointed by
        * only the "spark.jars" property.
        */
    -  def getUserJars(conf: SparkConf): Seq[String] = {
    +  def getUserJars(conf: SparkConf, isShell: Boolean = false): Seq[String] = {
    --- End diff --
    
    Do I still need to update the docs, or maybe this can be done later?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14196: [SPARK-16540][YARN][CORE] Avoid adding jars twice for Sp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14196
  
    **[Test build #62302 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62302/consoleFull)** for PR 14196 at commit [`86205fc`](https://github.com/apache/spark/commit/86205fcef29515ba72809fc2541e5d6aacfa76a7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14196: [SPARK-16540][YARN][CORE] Avoid adding jars twice...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/14196


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14196: [SPARK-16540][YARN][CORE] Avoid adding jars twice for Sp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14196
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62302/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14196: [SPARK-16540][YARN][CORE] Avoid adding jars twice for Sp...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/14196
  
    I was gonna say the `Utils` method now doesn't need to be called from `SparkContext`, which would avoid the new argument, but it does the parsing from a String into a Set, so that avoids some duplication. LGTM, merging into master / 2.0.
    
    (Mental note: eventually all these should be config constants so they can use the common parsing code in `ConfigBuilder.scala`...)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14196: [SPARK-16540][YARN][CORE] Avoid adding jars twice for Sp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14196
  
    **[Test build #62302 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62302/consoleFull)** for PR 14196 at commit [`86205fc`](https://github.com/apache/spark/commit/86205fcef29515ba72809fc2541e5d6aacfa76a7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org