You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/06/10 20:05:45 UTC

[GitHub] [spark] dbtsai opened a new pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath when hadoop is provided

dbtsai opened a new pull request #28788:
URL: https://github.com/apache/spark/pull/28788


   ### What changes were proposed in this pull request?
   If a Spark distribution has built-in hadoop runtime, Spark will not populate the hadoop classpath from `yarn.application.classpath` and `mapreduce.application.classpath` when a job is submitted to Yarn. Users can override this behavior by setting `spark.yarn.populateHadoopClasspath` to `true`.
   
   ### Why are the changes needed?
   Without this, Spark will populate the hadoop classpath from `yarn.application.classpath` and `mapreduce.application.classpath` even Spark distribution has built-in hadoop. This results jar conflict and many unexpected behaviors in runtime.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Manually test with two builds, with-hadoop and no-hadoop builds.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642317614






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643466438


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] viirya commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

viirya commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642457975


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-645799277


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124200/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643387909






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r439080442



##########
File path: docs/running-on-yarn.md
##########
@@ -82,6 +82,19 @@ In `cluster` mode, the driver runs on a different machine than the client, so `S
 
 Running Spark on YARN requires a binary distribution of Spark which is built with YARN support.
 Binary distributions can be downloaded from the [downloads page](https://spark.apache.org/downloads.html) of the project website.
+There are two variants of Spark binary distributions you can download. One is pre-built with a certain
+version of Apache Hadoop; this Spark distribution contains built-in Hadoop runtime, so we call it <code>with-hadoop</code> Spark
+distribution. The other one is pre-built with user-provided Hadoop; since this Spark distribution
+doesn't contain built-in Hadoop runtime, so users have to provide a Hadoop installation separately.

Review comment:
       `built-in` -> `a built-in`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642459471


   **[Test build #123830 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123830/testReport)** for PR 28788 at commit [`7c9e1ad`](https://github.com/apache/spark/commit/7c9e1ad2bb12246d689e46794bd7851d8188a545).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643009797


   **[Test build #123868 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123868/testReport)** for PR 28788 at commit [`13c3b40`](https://github.com/apache/spark/commit/13c3b40c8cbb7eb3a040950df92ba1eedbb4eaf5).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r438455981



##########
File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala
##########
@@ -393,5 +395,19 @@ package object config {
   private[yarn] val YARN_EXECUTOR_RESOURCE_TYPES_PREFIX = "spark.yarn.executor.resource."
   private[yarn] val YARN_DRIVER_RESOURCE_TYPES_PREFIX = "spark.yarn.driver.resource."
   private[yarn] val YARN_AM_RESOURCE_TYPES_PREFIX = "spark.yarn.am.resource."
-
+  lazy val IS_HADOOP_PROVIDED: Boolean = {
+    val configPath = "org/apache/spark/deploy/yarn/config.properties"
+    val propertyKey = "spark.yarn.isHadoopProvided"
+    try {
+      val prop = new Properties()
+      prop.load(ClassLoader.getSystemClassLoader.
+        getResourceAsStream(configPath))
+      prop.getProperty(propertyKey).toBoolean
+    } catch {
+      case e: Exception =>
+        log.warn(s"Can not load the default value of `$propertyKey` from " +
+          s"$configPath with error, ${e.toString}. Using false as a default value."
+        false

Review comment:
       This variable naming is a little confusing because
   `IS_HADOOP_PROVIDED` is used for `spark.yarn.populateHadoopClasspath`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642460115






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dbtsai commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dbtsai commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642935065


   @tgravescs I add some doc. Let me know if it's clear enough. Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dbtsai commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dbtsai commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r439136561



##########
File path: resource-managers/yarn/pom.xml
##########
@@ -30,8 +30,18 @@
   <properties>
     <sbt.project.name>yarn</sbt.project.name>
     <jersey-1.version>1.19</jersey-1.version>
+    <spark.yarn.isHadoopProvided>false</spark.yarn.isHadoopProvided>
   </properties>
 
+  <profiles>
+    <profile>
+      <id>hadoop-provided</id>
+      <properties>
+        <spark.yarn.isHadoopProvided>true</spark.yarn.isHadoopProvided>
+      </properties>
+    </profile>
+  </profiles>
+

Review comment:
       The next line of added doc `To build Spark yourself, refer to [Building Spark](building-spark.html).` has the link to the doc you mentioned. I think they should be able to figure it out when they read building spark session.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642530095


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] viirya commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

viirya commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642993425


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642317556






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dbtsai commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dbtsai commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r441985622



##########
File path: docs/running-on-yarn.md
##########
@@ -82,6 +82,19 @@ In `cluster` mode, the driver runs on a different machine than the client, so `S
 
 Running Spark on YARN requires a binary distribution of Spark which is built with YARN support.
 Binary distributions can be downloaded from the [downloads page](https://spark.apache.org/downloads.html) of the project website.
+There are two variants of Spark binary distributions you can download. One is pre-built with a certain
+version of Apache Hadoop; this Spark distribution contains built-in Hadoop runtime, so we call it `with-hadoop` Spark
+distribution. The other one is pre-built with user-provided Hadoop; since this Spark distribution
+doesn't contain a built-in Hadoop runtime, it's smaller, but users have to provide a Hadoop installation separately.
+We call this variant `no-hadoop` Spark distribution. For `with-hadoop` Spark distribution, since
+it contains a built-in Hadoop runtime already, by default, when a job is submitted to Hadoop Yarn cluster, to prevent jar conflict, it will not
+populate Yarn's classpath into Spark. To override this behavior, you can set <code>spark.yarn.populateHadoopClasspath=true</code>.
+For `no-hadoop` Spark distribution, Spark will populate Yarn's classpath by default in order to get Hadoop runtime. Note that some features such
+as Hive support are not available in `no-hadoop` Spark distribution. For `with-hadoop` Spark distribution,

Review comment:
       Okay, I'm getting old now :) Just somehow remember you told me this, and that was why our internal `no-hadoop` build can not support hive. I removed those lines.

##########
File path: docs/running-on-yarn.md
##########
@@ -82,6 +82,19 @@ In `cluster` mode, the driver runs on a different machine than the client, so `S
 
 Running Spark on YARN requires a binary distribution of Spark which is built with YARN support.
 Binary distributions can be downloaded from the [downloads page](https://spark.apache.org/downloads.html) of the project website.
+There are two variants of Spark binary distributions you can download. One is pre-built with a certain
+version of Apache Hadoop; this Spark distribution contains built-in Hadoop runtime, so we call it `with-hadoop` Spark
+distribution. The other one is pre-built with user-provided Hadoop; since this Spark distribution
+doesn't contain a built-in Hadoop runtime, it's smaller, but users have to provide a Hadoop installation separately.
+We call this variant `no-hadoop` Spark distribution. For `with-hadoop` Spark distribution, since
+it contains a built-in Hadoop runtime already, by default, when a job is submitted to Hadoop Yarn cluster, to prevent jar conflict, it will not
+populate Yarn's classpath into Spark. To override this behavior, you can set <code>spark.yarn.populateHadoopClasspath=true</code>.
+For `no-hadoop` Spark distribution, Spark will populate Yarn's classpath by default in order to get Hadoop runtime. Note that some features such
+as Hive support are not available in `no-hadoop` Spark distribution. For `with-hadoop` Spark distribution,

Review comment:
       Okay, I'm getting old now :) Just somehow remember you told me this, and that was why our internal `no-hadoop` build can not support hive. I removed those lines. Thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642940799


   **[Test build #123868 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123868/testReport)** for PR 28788 at commit [`13c3b40`](https://github.com/apache/spark/commit/13c3b40c8cbb7eb3a040950df92ba1eedbb4eaf5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642940799


   **[Test build #123868 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123868/testReport)** for PR 28788 at commit [`13c3b40`](https://github.com/apache/spark/commit/13c3b40c8cbb7eb3a040950df92ba1eedbb4eaf5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-645798134


   **[Test build #124200 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124200/testReport)** for PR 28788 at commit [`c224152`](https://github.com/apache/spark/commit/c2241528374cb64638204b21fd20a8c15cf931da).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642946152


   **[Test build #123869 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123869/testReport)** for PR 28788 at commit [`daffe6c`](https://github.com/apache/spark/commit/daffe6cc72be9b491def186c40baf58c8702eb1e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642994909






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642381940


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123793/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r440897397



##########
File path: docs/running-on-yarn.md
##########
@@ -82,6 +82,19 @@ In `cluster` mode, the driver runs on a different machine than the client, so `S
 
 Running Spark on YARN requires a binary distribution of Spark which is built with YARN support.
 Binary distributions can be downloaded from the [downloads page](https://spark.apache.org/downloads.html) of the project website.
+There are two variants of Spark binary distributions you can download. One is pre-built with a certain
+version of Apache Hadoop; this Spark distribution contains built-in Hadoop runtime, so we call it `with-hadoop` Spark
+distribution. The other one is pre-built with user-provided Hadoop; since this Spark distribution
+doesn't contain a built-in Hadoop runtime, it's smaller, but users have to provide a Hadoop installation separately.
+We call this variant `no-hadoop` Spark distribution. For `with-hadoop` Spark distribution, since
+it contains a built-in Hadoop runtime already, by default, when a job is submitted to Hadoop Yarn cluster, to prevent jar conflict, it will not
+populate Yarn's classpath into Spark. To override this behavior, you can set <code>spark.yarn.populateHadoopClasspath=true</code>.
+For `no-hadoop` Spark distribution, Spark will populate Yarn's classpath by default in order to get Hadoop runtime. Note that some features such
+as Hive support are not available in `no-hadoop` Spark distribution. For `with-hadoop` Spark distribution,

Review comment:
       ? This is added at the last commit. I didn't say like that, @dbtsai . :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r438454952



##########
File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala
##########
@@ -393,5 +395,19 @@ package object config {
   private[yarn] val YARN_EXECUTOR_RESOURCE_TYPES_PREFIX = "spark.yarn.executor.resource."
   private[yarn] val YARN_DRIVER_RESOURCE_TYPES_PREFIX = "spark.yarn.driver.resource."
   private[yarn] val YARN_AM_RESOURCE_TYPES_PREFIX = "spark.yarn.am.resource."
-
+  lazy val IS_HADOOP_PROVIDED: Boolean = {
+    val configPath = "org/apache/spark/deploy/yarn/config.properties"
+    val propertyKey = "spark.yarn.isHadoopProvided"
+    try {
+      val prop = new Properties()
+      prop.load(ClassLoader.getSystemClassLoader.
+        getResourceAsStream(configPath))
+      prop.getProperty(propertyKey).toBoolean
+    } catch {
+      case e: Exception =>
+        log.warn(s"Can not load the default value of `$propertyKey` from " +
+          s"$configPath with error, ${e.toString}. Using false as a default value."
+        false

Review comment:
       Can we have `true` since it's the previous behavior of `spark.yarn.populateHadoopClasspath`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r446784525



##########
File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala
##########
@@ -74,10 +76,11 @@ package object config {
     .doc("Whether to populate Hadoop classpath from `yarn.application.classpath` and " +
       "`mapreduce.application.classpath` Note that if this is set to `false`, it requires " +
       "a `with-Hadoop` Spark distribution that bundles Hadoop runtime or user has to provide " +
-      "a Hadoop installation separately.")
+      "a Hadoop installation separately. By default, for `with-hadoop` Spark distribution, " +
+      "this is set to `false`; for `no-hadoop` distribution, this is set to `true`.")
     .version("2.4.6")
     .booleanConf
-    .createWithDefault(true)
+    .createWithDefault(isHadoopProvided())

Review comment:
       @dbtsai, seems like it has not been added yet. Could we add it soon before we forget :-)?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643528380


   **[Test build #123943 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123943/testReport)** for PR 28788 at commit [`8fc906b`](https://github.com/apache/spark/commit/8fc906b52de3581caa45f2bd174521a6a610ff8a).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643487690






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r439176959



##########
File path: docs/running-on-yarn.md
##########
@@ -82,6 +82,19 @@ In `cluster` mode, the driver runs on a different machine than the client, so `S
 
 Running Spark on YARN requires a binary distribution of Spark which is built with YARN support.
 Binary distributions can be downloaded from the [downloads page](https://spark.apache.org/downloads.html) of the project website.
+There are two variants of Spark binary distributions you can download. One is pre-built with a certain
+version of Apache Hadoop; this Spark distribution contains built-in Hadoop runtime, so we call it <code>with-hadoop</code> Spark

Review comment:
       I guess .. ``` `...` ``` can be used instead of `<code>...</code>`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642288754






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643487690






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642381933


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642460115






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643044370






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642455741


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123822/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

viirya commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r438395914



##########
File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala
##########
@@ -77,7 +79,7 @@ package object config {
       "a Hadoop installation separately.")
     .version("2.4.6")
     .booleanConf
-    .createWithDefault(true)
+    .createWithDefault(IS_HADOOP_PROVIDED)

Review comment:
       Curious about why a fixed default config value is not enough?

##########
File path: resource-managers/yarn/src/main/resources/org/apache/spark/deploy/yarn/config.properties
##########
@@ -0,0 +1 @@
+spark.yarn.populateHadoopClasspath = ${spark.yarn.populateHadoopClasspath}

Review comment:
       Do you actually mean `spark.yarn.isHadoopProvided` here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642935850






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

viirya commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r439122984



##########
File path: resource-managers/yarn/pom.xml
##########
@@ -30,8 +30,18 @@
   <properties>
     <sbt.project.name>yarn</sbt.project.name>
     <jersey-1.version>1.19</jersey-1.version>
+    <spark.yarn.isHadoopProvided>false</spark.yarn.isHadoopProvided>
   </properties>
 
+  <profiles>
+    <profile>
+      <id>hadoop-provided</id>
+      <properties>
+        <spark.yarn.isHadoopProvided>true</spark.yarn.isHadoopProvided>
+      </properties>
+    </profile>
+  </profiles>
+

Review comment:
       Seems we already have https://github.com/apache/spark/blob/b11e42663be680a0357f2bf7bd8b16afe313eb5e/docs/building-spark.md#packaging-without-hadoop-dependencies-for-yarn




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642288754


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

viirya commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r439188335



##########
File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala
##########
@@ -74,10 +76,11 @@ package object config {
     .doc("Whether to populate Hadoop classpath from `yarn.application.classpath` and " +
       "`mapreduce.application.classpath` Note that if this is set to `false`, it requires " +
       "a `with-Hadoop` Spark distribution that bundles Hadoop runtime or user has to provide " +
-      "a Hadoop installation separately.")
+      "a Hadoop installation separately. By default, for `with-hadoop` Spark distribution, " +
+      "this is set to `false`; for `no-hadoop` distribution, this is set to `true`.")
     .version("2.4.6")
     .booleanConf
-    .createWithDefault(true)
+    .createWithDefault(isHadoopProvided())

Review comment:
       I also have thought about adding to migration guide. No yarn related migration guide, so maybe can put in core.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643063537






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-730213388


   @dbtsai about https://github.com/apache/spark/pull/28788#discussion_r526668885,  it would be great if we can add it into migration guide so that it's automatically pointed out in the 3.1.0 release notes :-).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dbtsai commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dbtsai commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r438429459



##########
File path: resource-managers/yarn/src/main/resources/org/apache/spark/deploy/yarn/config.properties
##########
@@ -0,0 +1 @@
+spark.yarn.populateHadoopClasspath = ${spark.yarn.populateHadoopClasspath}

Review comment:
       I messed up while I cherry-picked from internal branch to this PR. Fixed. Thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642831379


   The irrelevant failure is `decomissioned executor`.  cc @holdenk 
   ```
   org.apache.spark.storage.BlockManagerDecommissionSuite
   .verify that an already running task which is going to cache data
   succeeds on a decommissioned executor
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r440899419



##########
File path: docs/running-on-yarn.md
##########
@@ -82,6 +82,19 @@ In `cluster` mode, the driver runs on a different machine than the client, so `S
 
 Running Spark on YARN requires a binary distribution of Spark which is built with YARN support.
 Binary distributions can be downloaded from the [downloads page](https://spark.apache.org/downloads.html) of the project website.
+There are two variants of Spark binary distributions you can download. One is pre-built with a certain
+version of Apache Hadoop; this Spark distribution contains built-in Hadoop runtime, so we call it `with-hadoop` Spark
+distribution. The other one is pre-built with user-provided Hadoop; since this Spark distribution
+doesn't contain a built-in Hadoop runtime, it's smaller, but users have to provide a Hadoop installation separately.
+We call this variant `no-hadoop` Spark distribution. For `with-hadoop` Spark distribution, since
+it contains a built-in Hadoop runtime already, by default, when a job is submitted to Hadoop Yarn cluster, to prevent jar conflict, it will not
+populate Yarn's classpath into Spark. To override this behavior, you can set <code>spark.yarn.populateHadoopClasspath=true</code>.
+For `no-hadoop` Spark distribution, Spark will populate Yarn's classpath by default in order to get Hadoop runtime. Note that some features such
+as Hive support are not available in `no-hadoop` Spark distribution. For `with-hadoop` Spark distribution,

Review comment:
       You can make `no-hadoop` distribution into `hadoop` distribution if you copy the exclude jars like `with-hadoop` distribution. If you make have `with-hadoop` distribution back, there is no difference, isn't it?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643528811






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-745002844


   gentle ping @dbtsai.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643105954


   **[Test build #123887 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123887/testReport)** for PR 28788 at commit [`8fc906b`](https://github.com/apache/spark/commit/8fc906b52de3581caa45f2bd174521a6a610ff8a).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643489568


   **[Test build #123943 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123943/testReport)** for PR 28788 at commit [`8fc906b`](https://github.com/apache/spark/commit/8fc906b52de3581caa45f2bd174521a6a610ff8a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642993067


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dbtsai commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dbtsai commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r439203081



##########
File path: docs/running-on-yarn.md
##########
@@ -82,6 +82,19 @@ In `cluster` mode, the driver runs on a different machine than the client, so `S
 
 Running Spark on YARN requires a binary distribution of Spark which is built with YARN support.
 Binary distributions can be downloaded from the [downloads page](https://spark.apache.org/downloads.html) of the project website.
+There are two variants of Spark binary distributions you can download. One is pre-built with a certain

Review comment:
       In the download link, we use terminology of `pre-built` and `pre-built with user provided hadoop`, and in some other places, we use `with-hadoop`, and `no-hadoop` distributions. That was why I use both of them here to make it clear.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dbtsai commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dbtsai commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642296071


   @tgravescs I totally understand this could be a breaking change in Spark 3.1. Being said that, it's a postive breaking change as long as we doucment the behavior well.
   
   Many of enginners run into issues that their applications pick up two hadoop versions, and the problems happen when the version of the hadoop in cluster is very different from the standard hadoop jars in Spark. In this case, the transitive deps such as guava, jackson can be very different resulting runtime errors.
   
   As you mentioned, if you are using some jars that are in cluster's classpath but not in Spark's `with-hadoop` distribution, they should ship it together with application. Or you should use `no-hadoop` distribution, and the entire classpath of hadoop is provided by cluster.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642455728






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642317257


   **[Test build #123790 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123790/testReport)** for PR 28788 at commit [`d3f8088`](https://github.com/apache/spark/commit/d3f80880f159fe3803b873eaf4b965badf6e4468).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r438456814



##########
File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala
##########
@@ -393,5 +395,19 @@ package object config {
   private[yarn] val YARN_EXECUTOR_RESOURCE_TYPES_PREFIX = "spark.yarn.executor.resource."
   private[yarn] val YARN_DRIVER_RESOURCE_TYPES_PREFIX = "spark.yarn.driver.resource."
   private[yarn] val YARN_AM_RESOURCE_TYPES_PREFIX = "spark.yarn.am.resource."
-
+  lazy val IS_HADOOP_PROVIDED: Boolean = {
+    val configPath = "org/apache/spark/deploy/yarn/config.properties"
+    val propertyKey = "spark.yarn.isHadoopProvided"
+    try {
+      val prop = new Properties()
+      prop.load(ClassLoader.getSystemClassLoader.
+        getResourceAsStream(configPath))
+      prop.getProperty(propertyKey).toBoolean
+    } catch {
+      case e: Exception =>
+        log.warn(s"Can not load the default value of `$propertyKey` from " +
+          s"$configPath with error, ${e.toString}. Using false as a default value."
+        false

Review comment:
       If `spark.yarn.isHadoopProvided` doesn't exist, `IS_HADOOP_PROVIDED` becomes `false`. So, Hadoop is not provided. As a result, Spark needs to use the built-in Hadoop. But, this line seems to make `spark.yarn.populateHadoopClasspath=false`. IIRC, we need to populate hadoop class pathes if we are using built-in Hadoop, aren't we?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643387401


   **[Test build #123940 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123940/testReport)** for PR 28788 at commit [`8fc906b`](https://github.com/apache/spark/commit/8fc906b52de3581caa45f2bd174521a6a610ff8a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642323598






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643063537






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643109165






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r438474054



##########
File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala
##########
@@ -393,5 +395,19 @@ package object config {
   private[yarn] val YARN_EXECUTOR_RESOURCE_TYPES_PREFIX = "spark.yarn.executor.resource."
   private[yarn] val YARN_DRIVER_RESOURCE_TYPES_PREFIX = "spark.yarn.driver.resource."
   private[yarn] val YARN_AM_RESOURCE_TYPES_PREFIX = "spark.yarn.am.resource."
-
+  lazy val IS_HADOOP_PROVIDED: Boolean = {
+    val configPath = "org/apache/spark/deploy/yarn/config.properties"
+    val propertyKey = "spark.yarn.isHadoopProvided"
+    try {
+      val prop = new Properties()
+      prop.load(ClassLoader.getSystemClassLoader.
+        getResourceAsStream(configPath))
+      prop.getProperty(propertyKey).toBoolean
+    } catch {
+      case e: Exception =>
+        log.warn(s"Can not load the default value of `$propertyKey` from " +
+          s"$configPath with error, ${e.toString}. Using false as a default value."
+        false

Review comment:
       But, what you mean is inconsistent with the PR title, `Only populate Hadoop classpath for no-hadoop build`. This PR is trying to set `IS_HADOOP_PROVIDED (false)` to `spark.yarn.populateHadoopClasspath` always for Hadoop distribution too, isn't it?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642288434


   **[Test build #123786 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123786/testReport)** for PR 28788 at commit [`86318d6`](https://github.com/apache/spark/commit/86318d6789706941b609dbe58d901c03e331b05e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643167490






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dbtsai commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dbtsai commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r441986877



##########
File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala
##########
@@ -74,10 +76,11 @@ package object config {
     .doc("Whether to populate Hadoop classpath from `yarn.application.classpath` and " +
       "`mapreduce.application.classpath` Note that if this is set to `false`, it requires " +
       "a `with-Hadoop` Spark distribution that bundles Hadoop runtime or user has to provide " +
-      "a Hadoop installation separately.")
+      "a Hadoop installation separately. By default, for `with-hadoop` Spark distribution, " +
+      "this is set to `false`; for `no-hadoop` distribution, this is set to `true`.")
     .version("2.4.6")
     .booleanConf
-    .createWithDefault(true)
+    .createWithDefault(isHadoopProvided())

Review comment:
       I'll add it as a followup PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642992827


   **[Test build #123869 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123869/testReport)** for PR 28788 at commit [`daffe6c`](https://github.com/apache/spark/commit/daffe6cc72be9b491def186c40baf58c8702eb1e).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642993067






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dbtsai commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dbtsai commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-645808793


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r440899419



##########
File path: docs/running-on-yarn.md
##########
@@ -82,6 +82,19 @@ In `cluster` mode, the driver runs on a different machine than the client, so `S
 
 Running Spark on YARN requires a binary distribution of Spark which is built with YARN support.
 Binary distributions can be downloaded from the [downloads page](https://spark.apache.org/downloads.html) of the project website.
+There are two variants of Spark binary distributions you can download. One is pre-built with a certain
+version of Apache Hadoop; this Spark distribution contains built-in Hadoop runtime, so we call it `with-hadoop` Spark
+distribution. The other one is pre-built with user-provided Hadoop; since this Spark distribution
+doesn't contain a built-in Hadoop runtime, it's smaller, but users have to provide a Hadoop installation separately.
+We call this variant `no-hadoop` Spark distribution. For `with-hadoop` Spark distribution, since
+it contains a built-in Hadoop runtime already, by default, when a job is submitted to Hadoop Yarn cluster, to prevent jar conflict, it will not
+populate Yarn's classpath into Spark. To override this behavior, you can set <code>spark.yarn.populateHadoopClasspath=true</code>.
+For `no-hadoop` Spark distribution, Spark will populate Yarn's classpath by default in order to get Hadoop runtime. Note that some features such
+as Hive support are not available in `no-hadoop` Spark distribution. For `with-hadoop` Spark distribution,

Review comment:
       You can make `no-hadoop` distribution into `hadoop` distribution if you copy the exclude jars like `with-hadoop` distribution. If you have `with-hadoop` distribution back, there is no difference, isn't it?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dbtsai commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dbtsai commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r440618141



##########
File path: docs/running-on-yarn.md
##########
@@ -82,6 +82,19 @@ In `cluster` mode, the driver runs on a different machine than the client, so `S
 
 Running Spark on YARN requires a binary distribution of Spark which is built with YARN support.
 Binary distributions can be downloaded from the [downloads page](https://spark.apache.org/downloads.html) of the project website.
+There are two variants of Spark binary distributions you can download. One is pre-built with a certain
+version of Apache Hadoop; this Spark distribution contains built-in Hadoop runtime, so we call it `with-hadoop` Spark
+distribution. The other one is pre-built with user-provided Hadoop; since this Spark distribution
+doesn't contain a built-in Hadoop runtime, it's smaller, but users have to provide a Hadoop installation separately.
+We call this variant `no-hadoop` Spark distribution. For `with-hadoop` Spark distribution, since
+it contains a built-in Hadoop runtime already, by default, when a job is submitted to Hadoop Yarn cluster, to prevent jar conflict, it will not
+populate Yarn's classpath into Spark. To override this behavior, you can set <code>spark.yarn.populateHadoopClasspath=true</code>.
+For `no-hadoop` Spark distribution, Spark will populate Yarn's classpath by default in order to get Hadoop runtime. Note that some features such
+as Hive support are not available in `no-hadoop` Spark distribution. For `with-hadoop` Spark distribution,

Review comment:
       Maybe I'm wrong, but I got the impression from @dongjoon-hyun that no-hadoop Spark distribution doesn't support Hive.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642413193


   **[Test build #123822 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123822/testReport)** for PR 28788 at commit [`7c9e1ad`](https://github.com/apache/spark/commit/7c9e1ad2bb12246d689e46794bd7851d8188a545).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643167509


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123898/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642529638


   **[Test build #123830 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123830/testReport)** for PR 28788 at commit [`7c9e1ad`](https://github.com/apache/spark/commit/7c9e1ad2bb12246d689e46794bd7851d8188a545).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r438454952



##########
File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala
##########
@@ -393,5 +395,19 @@ package object config {
   private[yarn] val YARN_EXECUTOR_RESOURCE_TYPES_PREFIX = "spark.yarn.executor.resource."
   private[yarn] val YARN_DRIVER_RESOURCE_TYPES_PREFIX = "spark.yarn.driver.resource."
   private[yarn] val YARN_AM_RESOURCE_TYPES_PREFIX = "spark.yarn.am.resource."
-
+  lazy val IS_HADOOP_PROVIDED: Boolean = {
+    val configPath = "org/apache/spark/deploy/yarn/config.properties"
+    val propertyKey = "spark.yarn.isHadoopProvided"
+    try {
+      val prop = new Properties()
+      prop.load(ClassLoader.getSystemClassLoader.
+        getResourceAsStream(configPath))
+      prop.getProperty(propertyKey).toBoolean
+    } catch {
+      case e: Exception =>
+        log.warn(s"Can not load the default value of `$propertyKey` from " +
+          s"$configPath with error, ${e.toString}. Using false as a default value."
+        false

Review comment:
       Can we have `true` since it's the previous behavior?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643528811






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643108113


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r439176423



##########
File path: docs/running-on-yarn.md
##########
@@ -82,6 +82,19 @@ In `cluster` mode, the driver runs on a different machine than the client, so `S
 
 Running Spark on YARN requires a binary distribution of Spark which is built with YARN support.
 Binary distributions can be downloaded from the [downloads page](https://spark.apache.org/downloads.html) of the project website.
+There are two variants of Spark binary distributions you can download. One is pre-built with a certain

Review comment:
       no really biggie but I personally avoid abbreviation in the public documentation.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642323153


   **[Test build #123793 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123793/testReport)** for PR 28788 at commit [`92b061d`](https://github.com/apache/spark/commit/92b061d53a7125d8494d6f44715ba5b01fa2fc13).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642229638






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643006296






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dbtsai commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dbtsai commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642803051


   retest this please
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643005724


   **[Test build #123867 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123867/testReport)** for PR 28788 at commit [`b5a5d5f`](https://github.com/apache/spark/commit/b5a5d5f6a657cdfda162993f009944a83d80849e).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643044370






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

tgravescs commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r440242189



##########
File path: docs/running-on-yarn.md
##########
@@ -82,6 +82,19 @@ In `cluster` mode, the driver runs on a different machine than the client, so `S
 
 Running Spark on YARN requires a binary distribution of Spark which is built with YARN support.
 Binary distributions can be downloaded from the [downloads page](https://spark.apache.org/downloads.html) of the project website.
+There are two variants of Spark binary distributions you can download. One is pre-built with a certain
+version of Apache Hadoop; this Spark distribution contains built-in Hadoop runtime, so we call it `with-hadoop` Spark
+distribution. The other one is pre-built with user-provided Hadoop; since this Spark distribution
+doesn't contain a built-in Hadoop runtime, it's smaller, but users have to provide a Hadoop installation separately.
+We call this variant `no-hadoop` Spark distribution. For `with-hadoop` Spark distribution, since
+it contains a built-in Hadoop runtime already, by default, when a job is submitted to Hadoop Yarn cluster, to prevent jar conflict, it will not
+populate Yarn's classpath into Spark. To override this behavior, you can set <code>spark.yarn.populateHadoopClasspath=true</code>.
+For `no-hadoop` Spark distribution, Spark will populate Yarn's classpath by default in order to get Hadoop runtime. Note that some features such
+as Hive support are not available in `no-hadoop` Spark distribution. For `with-hadoop` Spark distribution,

Review comment:
       is this true, hive will not work?  there is a separate hive-provided flag.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r438474054



##########
File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala
##########
@@ -393,5 +395,19 @@ package object config {
   private[yarn] val YARN_EXECUTOR_RESOURCE_TYPES_PREFIX = "spark.yarn.executor.resource."
   private[yarn] val YARN_DRIVER_RESOURCE_TYPES_PREFIX = "spark.yarn.driver.resource."
   private[yarn] val YARN_AM_RESOURCE_TYPES_PREFIX = "spark.yarn.am.resource."
-
+  lazy val IS_HADOOP_PROVIDED: Boolean = {
+    val configPath = "org/apache/spark/deploy/yarn/config.properties"
+    val propertyKey = "spark.yarn.isHadoopProvided"
+    try {
+      val prop = new Properties()
+      prop.load(ClassLoader.getSystemClassLoader.
+        getResourceAsStream(configPath))
+      prop.getProperty(propertyKey).toBoolean
+    } catch {
+      case e: Exception =>
+        log.warn(s"Can not load the default value of `$propertyKey` from " +
+          s"$configPath with error, ${e.toString}. Using false as a default value."
+        false

Review comment:
       ~But, what you mean is inconsistent with the PR title, `Only populate Hadoop classpath for no-hadoop build`. This PR is trying to set `IS_HADOOP_PROVIDED (false)` to `spark.yarn.populateHadoopClasspath` always for Hadoop distribution too, isn't it?~
   
   Sorry, now I got the correct idea. Thanks, @dbtsai .




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643063224


   **[Test build #123887 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123887/testReport)** for PR 28788 at commit [`8fc906b`](https://github.com/apache/spark/commit/8fc906b52de3581caa45f2bd174521a6a610ff8a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642229636


   **[Test build #123779 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123779/testReport)** for PR 28788 at commit [`40fb394`](https://github.com/apache/spark/commit/40fb394ccea167d7fba28cf120af5326a3d51758).
    * This patch **fails RAT tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642530105


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123830/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642455728


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] tgravescs commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

tgravescs commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642237229


   This could definitely be a breaking change. Your Hadoop deployment could include other jars that are being picked up via the class path that aren't distributed with your Spark distribution that include Hadoop. I know I saw this previous where the Hadoop configuration required certain classes and those were picked up from the cluster Hadoop class path and not distributed with spark. 
   
   @dhruve @redsanket 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642946152


   **[Test build #123869 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123869/testReport)** for PR 28788 at commit [`daffe6c`](https://github.com/apache/spark/commit/daffe6cc72be9b491def186c40baf58c8702eb1e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r439176339



##########
File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala
##########
@@ -74,10 +76,11 @@ package object config {
     .doc("Whether to populate Hadoop classpath from `yarn.application.classpath` and " +
       "`mapreduce.application.classpath` Note that if this is set to `false`, it requires " +
       "a `with-Hadoop` Spark distribution that bundles Hadoop runtime or user has to provide " +
-      "a Hadoop installation separately.")
+      "a Hadoop installation separately. By default, for `with-hadoop` Spark distribution, " +
+      "this is set to `false`; for `no-hadoop` distribution, this is set to `true`.")
     .version("2.4.6")
     .booleanConf
-    .createWithDefault(true)
+    .createWithDefault(isHadoopProvided())

Review comment:
       I think it's better to update the migration guide at `core-migration-guide.md`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642229673


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123779/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dbtsai commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dbtsai commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r438463484



##########
File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala
##########
@@ -393,5 +395,19 @@ package object config {
   private[yarn] val YARN_EXECUTOR_RESOURCE_TYPES_PREFIX = "spark.yarn.executor.resource."
   private[yarn] val YARN_DRIVER_RESOURCE_TYPES_PREFIX = "spark.yarn.driver.resource."
   private[yarn] val YARN_AM_RESOURCE_TYPES_PREFIX = "spark.yarn.am.resource."
-
+  lazy val IS_HADOOP_PROVIDED: Boolean = {
+    val configPath = "org/apache/spark/deploy/yarn/config.properties"
+    val propertyKey = "spark.yarn.isHadoopProvided"
+    try {
+      val prop = new Properties()
+      prop.load(ClassLoader.getSystemClassLoader.
+        getResourceAsStream(configPath))
+      prop.getProperty(propertyKey).toBoolean
+    } catch {
+      case e: Exception =>
+        log.warn(s"Can not load the default value of `$propertyKey` from " +
+          s"$configPath with error, ${e.toString}. Using false as a default value."
+        false

Review comment:
       When hadoop is provided, that means our Spark distribution doesn't have built-in hadoop; thus, we need to populate the hadoop classpath from Yarn. I can add comment there so people can understand it easier.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642229117


   **[Test build #123779 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123779/testReport)** for PR 28788 at commit [`40fb394`](https://github.com/apache/spark/commit/40fb394ccea167d7fba28cf120af5326a3d51758).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r439203785



##########
File path: docs/running-on-yarn.md
##########
@@ -82,6 +82,19 @@ In `cluster` mode, the driver runs on a different machine than the client, so `S
 
 Running Spark on YARN requires a binary distribution of Spark which is built with YARN support.
 Binary distributions can be downloaded from the [downloads page](https://spark.apache.org/downloads.html) of the project website.
+There are two variants of Spark binary distributions you can download. One is pre-built with a certain

Review comment:
       Oops, I meant things like "don't"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-645798681






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643466438






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642946544






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642935850






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643465818


   **[Test build #123940 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123940/testReport)** for PR 28788 at commit [`8fc906b`](https://github.com/apache/spark/commit/8fc906b52de3581caa45f2bd174521a6a610ff8a).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643167490


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643032549


   cc @squito and @HeartSaVioR FYI


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r438454952



##########
File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala
##########
@@ -393,5 +395,19 @@ package object config {
   private[yarn] val YARN_EXECUTOR_RESOURCE_TYPES_PREFIX = "spark.yarn.executor.resource."
   private[yarn] val YARN_DRIVER_RESOURCE_TYPES_PREFIX = "spark.yarn.driver.resource."
   private[yarn] val YARN_AM_RESOURCE_TYPES_PREFIX = "spark.yarn.am.resource."
-
+  lazy val IS_HADOOP_PROVIDED: Boolean = {
+    val configPath = "org/apache/spark/deploy/yarn/config.properties"
+    val propertyKey = "spark.yarn.isHadoopProvided"
+    try {
+      val prop = new Properties()
+      prop.load(ClassLoader.getSystemClassLoader.
+        getResourceAsStream(configPath))
+      prop.getProperty(propertyKey).toBoolean
+    } catch {
+      case e: Exception =>
+        log.warn(s"Can not load the default value of `$propertyKey` from " +
+          s"$configPath with error, ${e.toString}. Using false as a default value."
+        false

Review comment:
       ~Can we have `true` since it's the previous behavior?~ Oh, never mind.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642317540


   **[Test build #123790 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123790/testReport)** for PR 28788 at commit [`d3f8088`](https://github.com/apache/spark/commit/d3f80880f159fe3803b873eaf4b965badf6e4468).
    * This patch **fails RAT tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dbtsai closed pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dbtsai closed pull request #28788:
URL: https://github.com/apache/spark/pull/28788


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642941266






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642288761






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643106480






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dbtsai commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dbtsai commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r441985622



##########
File path: docs/running-on-yarn.md
##########
@@ -82,6 +82,19 @@ In `cluster` mode, the driver runs on a different machine than the client, so `S
 
 Running Spark on YARN requires a binary distribution of Spark which is built with YARN support.
 Binary distributions can be downloaded from the [downloads page](https://spark.apache.org/downloads.html) of the project website.
+There are two variants of Spark binary distributions you can download. One is pre-built with a certain
+version of Apache Hadoop; this Spark distribution contains built-in Hadoop runtime, so we call it `with-hadoop` Spark
+distribution. The other one is pre-built with user-provided Hadoop; since this Spark distribution
+doesn't contain a built-in Hadoop runtime, it's smaller, but users have to provide a Hadoop installation separately.
+We call this variant `no-hadoop` Spark distribution. For `with-hadoop` Spark distribution, since
+it contains a built-in Hadoop runtime already, by default, when a job is submitted to Hadoop Yarn cluster, to prevent jar conflict, it will not
+populate Yarn's classpath into Spark. To override this behavior, you can set <code>spark.yarn.populateHadoopClasspath=true</code>.
+For `no-hadoop` Spark distribution, Spark will populate Yarn's classpath by default in order to get Hadoop runtime. Note that some features such
+as Hive support are not available in `no-hadoop` Spark distribution. For `with-hadoop` Spark distribution,

Review comment:
       Okay, I'm getting old now, bad memory :) Just somehow remember you told me this, and that was why our internal `no-hadoop` build can not support hive. I removed those lines. Thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r439080001



##########
File path: docs/running-on-yarn.md
##########
@@ -82,6 +82,19 @@ In `cluster` mode, the driver runs on a different machine than the client, so `S
 
 Running Spark on YARN requires a binary distribution of Spark which is built with YARN support.
 Binary distributions can be downloaded from the [downloads page](https://spark.apache.org/downloads.html) of the project website.
+There are two variants of Spark binary distributions you can download. One is pre-built with a certain

Review comment:
       ~`pre-built with` -> `pre-built one with`?~ nvm.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642935312


   **[Test build #123867 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123867/testReport)** for PR 28788 at commit [`b5a5d5f`](https://github.com/apache/spark/commit/b5a5d5f6a657cdfda162993f009944a83d80849e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642229117


   **[Test build #123779 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123779/testReport)** for PR 28788 at commit [`40fb394`](https://github.com/apache/spark/commit/40fb394ccea167d7fba28cf120af5326a3d51758).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643010535






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642454640


   **[Test build #123822 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123822/testReport)** for PR 28788 at commit [`7c9e1ad`](https://github.com/apache/spark/commit/7c9e1ad2bb12246d689e46794bd7851d8188a545).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] tgravescs edited a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

tgravescs edited a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642237229


   This could definitely be a breaking change. Your Hadoop deployment could include other jars that are being picked up via the class path that aren't distributed with your Spark distribution that include Hadoop. I know I saw this previous where the Hadoop configuration required certain classes and those were picked up from the cluster Hadoop class path and not distributed with spark, even though spark also had the standard Hadoop jars with it.
   
   @dhruve @redsanket 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642272381


   Thank you for pinging me, @dbtsai .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dbtsai commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Yarn Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dbtsai commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642390812


   @tgravescs +1 with detailed documentation. Users can alwasys turn `spark.yarn.populateHadoopClasspath` to `true` if it's desired. I'll work on the documentation. Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642993072


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123869/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642288927






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642381933






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642530095






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642994909






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642413193


   **[Test build #123822 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123822/testReport)** for PR 28788 at commit [`7c9e1ad`](https://github.com/apache/spark/commit/7c9e1ad2bb12246d689e46794bd7851d8188a545).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643109165






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dbtsai commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dbtsai commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r439088312



##########
File path: docs/running-on-yarn.md
##########
@@ -82,6 +82,19 @@ In `cluster` mode, the driver runs on a different machine than the client, so `S
 
 Running Spark on YARN requires a binary distribution of Spark which is built with YARN support.
 Binary distributions can be downloaded from the [downloads page](https://spark.apache.org/downloads.html) of the project website.
+There are two variants of Spark binary distributions you can download. One is pre-built with a certain
+version of Apache Hadoop; this Spark distribution contains built-in Hadoop runtime, so we call it <code>with-hadoop</code> Spark
+distribution. The other one is pre-built with user-provided Hadoop; since this Spark distribution
+doesn't contain built-in Hadoop runtime, so users have to provide a Hadoop installation separately.

Review comment:
       Fixed. Thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643063224


   **[Test build #123887 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123887/testReport)** for PR 28788 at commit [`8fc906b`](https://github.com/apache/spark/commit/8fc906b52de3581caa45f2bd174521a6a610ff8a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

viirya commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r439126220



##########
File path: resource-managers/yarn/pom.xml
##########
@@ -30,8 +30,18 @@
   <properties>
     <sbt.project.name>yarn</sbt.project.name>
     <jersey-1.version>1.19</jersey-1.version>
+    <spark.yarn.isHadoopProvided>false</spark.yarn.isHadoopProvided>
   </properties>
 
+  <profiles>
+    <profile>
+      <id>hadoop-provided</id>
+      <properties>
+        <spark.yarn.isHadoopProvided>true</spark.yarn.isHadoopProvided>
+      </properties>
+    </profile>
+  </profiles>
+

Review comment:
       Yea, thanks. I see. I just wonder if we want to mention this profile in the docs you added (docs/running-on-yarn.md), or adding a link to the doc link I posted above, so users can know how to build a "no-hadoop" distribution, when reading this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dbtsai commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dbtsai commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r439203266



##########
File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala
##########
@@ -74,10 +76,11 @@ package object config {
     .doc("Whether to populate Hadoop classpath from `yarn.application.classpath` and " +
       "`mapreduce.application.classpath` Note that if this is set to `false`, it requires " +
       "a `with-Hadoop` Spark distribution that bundles Hadoop runtime or user has to provide " +
-      "a Hadoop installation separately.")
+      "a Hadoop installation separately. By default, for `with-hadoop` Spark distribution, " +
+      "this is set to `false`; for `no-hadoop` distribution, this is set to `true`.")
     .version("2.4.6")
     .booleanConf
-    .createWithDefault(true)
+    .createWithDefault(isHadoopProvided())

Review comment:
       I'll add it to migration guide as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-745004890


   I tagged `release-notes` too in the JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dbtsai commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dbtsai commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r439125137



##########
File path: resource-managers/yarn/pom.xml
##########
@@ -30,8 +30,18 @@
   <properties>
     <sbt.project.name>yarn</sbt.project.name>
     <jersey-1.version>1.19</jersey-1.version>
+    <spark.yarn.isHadoopProvided>false</spark.yarn.isHadoopProvided>
   </properties>
 
+  <profiles>
+    <profile>
+      <id>hadoop-provided</id>
+      <properties>
+        <spark.yarn.isHadoopProvided>true</spark.yarn.isHadoopProvided>
+      </properties>
+    </profile>
+  </profiles>
+

Review comment:
       This profile is already in parent POM, so when we build it with `hadoop-provided` from the root project, this will only add the property to yarn scope as we don't need this property globally for the entire Spark project.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

viirya commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r439121861



##########
File path: resource-managers/yarn/pom.xml
##########
@@ -30,8 +30,18 @@
   <properties>
     <sbt.project.name>yarn</sbt.project.name>
     <jersey-1.version>1.19</jersey-1.version>
+    <spark.yarn.isHadoopProvided>false</spark.yarn.isHadoopProvided>
   </properties>
 
+  <profiles>
+    <profile>
+      <id>hadoop-provided</id>
+      <properties>
+        <spark.yarn.isHadoopProvided>true</spark.yarn.isHadoopProvided>
+      </properties>
+    </profile>
+  </profiles>
+

Review comment:
       Do we need to mention this profile option in docs?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

viirya commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r439126220



##########
File path: resource-managers/yarn/pom.xml
##########
@@ -30,8 +30,18 @@
   <properties>
     <sbt.project.name>yarn</sbt.project.name>
     <jersey-1.version>1.19</jersey-1.version>
+    <spark.yarn.isHadoopProvided>false</spark.yarn.isHadoopProvided>
   </properties>
 
+  <profiles>
+    <profile>
+      <id>hadoop-provided</id>
+      <properties>
+        <spark.yarn.isHadoopProvided>true</spark.yarn.isHadoopProvided>
+      </properties>
+    </profile>
+  </profiles>
+

Review comment:
       Yea, thanks. I see. I just wonder if we want to mention this profile in the docs you added (docs/running-on-yarn.md), or adding a link to the doc link I posted above, so users can know how to build a no-hadoop distribution, when reading this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642413591






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-645799254


   **[Test build #124200 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124200/testReport)** for PR 28788 at commit [`c224152`](https://github.com/apache/spark/commit/c2241528374cb64638204b21fd20a8c15cf931da).
    * This patch **fails build dependency tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642994067


   **[Test build #123872 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123872/testReport)** for PR 28788 at commit [`daffe6c`](https://github.com/apache/spark/commit/daffe6cc72be9b491def186c40baf58c8702eb1e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643108563


   **[Test build #123898 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123898/testReport)** for PR 28788 at commit [`8fc906b`](https://github.com/apache/spark/commit/8fc906b52de3581caa45f2bd174521a6a610ff8a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dbtsai commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dbtsai commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-645800313


   Merged into master as previous build works, and no actual code change in the last commit. Thank you all for reviewing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] tgravescs commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

tgravescs commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642335688


   right they could and I agree probably ideally should, but I can see environments where that isn't necessarily easy to do (find the jars they have to ship). Default spark builds with Hadoop and we distribute Spark with Hadoop jars. If I run this on a cluster now it may just fail with an error I have no idea what it is when the previous spark version ran just fine. 
   
   I bring it up mostly to make sure we thought about it and see if anyone had strong opinions. . 
    I'm ok with this but think we should document it. Put it in the release notes, we definitely need to update the running on yarn docs for default value of spark.yarn.populateHadoopClasspath. it might not hurt to put a couple sentences about it in the Launching Spark on YARN section.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dbtsai commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dbtsai commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642230270


   cc @tgravescs @vanzin @dongjoon-hyun @viirya 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dbtsai commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dbtsai commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r441986680



##########
File path: docs/running-on-yarn.md
##########
@@ -82,6 +82,19 @@ In `cluster` mode, the driver runs on a different machine than the client, so `S
 
 Running Spark on YARN requires a binary distribution of Spark which is built with YARN support.
 Binary distributions can be downloaded from the [downloads page](https://spark.apache.org/downloads.html) of the project website.
+There are two variants of Spark binary distributions you can download. One is pre-built with a certain
+version of Apache Hadoop; this Spark distribution contains built-in Hadoop runtime, so we call it <code>with-hadoop</code> Spark

Review comment:
       addressed. thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643106484


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123887/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642317548


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642381092


   **[Test build #123793 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123793/testReport)** for PR 28788 at commit [`92b061d`](https://github.com/apache/spark/commit/92b061d53a7125d8494d6f44715ba5b01fa2fc13).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643387401


   **[Test build #123940 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123940/testReport)** for PR 28788 at commit [`8fc906b`](https://github.com/apache/spark/commit/8fc906b52de3581caa45f2bd174521a6a610ff8a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643466447


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123940/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642317257


   **[Test build #123790 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123790/testReport)** for PR 28788 at commit [`d3f8088`](https://github.com/apache/spark/commit/d3f80880f159fe3803b873eaf4b965badf6e4468).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642323153


   **[Test build #123793 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123793/testReport)** for PR 28788 at commit [`92b061d`](https://github.com/apache/spark/commit/92b061d53a7125d8494d6f44715ba5b01fa2fc13).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643387909






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dbtsai commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dbtsai commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r438430412



##########
File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala
##########
@@ -77,7 +79,7 @@ package object config {
       "a Hadoop installation separately.")
     .version("2.4.6")
     .booleanConf
-    .createWithDefault(true)
+    .createWithDefault(IS_HADOOP_PROVIDED)

Review comment:
       Because we have two favors of builds, and for `with-hadoop` build, it's better not to populate the hadoop classpath while for no-hadoop build, we need to populate the hadoop class path from yarn. Thus, we can not have one default config value for both cases.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642288746


   **[Test build #123786 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123786/testReport)** for PR 28788 at commit [`86318d6`](https://github.com/apache/spark/commit/86318d6789706941b609dbe58d901c03e331b05e).
    * This patch **fails RAT tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] viirya commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

viirya commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643487137


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] viirya commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

viirya commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643385682


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642323598






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642941266






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642229638






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643108563


   **[Test build #123898 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123898/testReport)** for PR 28788 at commit [`8fc906b`](https://github.com/apache/spark/commit/8fc906b52de3581caa45f2bd174521a6a610ff8a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r439080001



##########
File path: docs/running-on-yarn.md
##########
@@ -82,6 +82,19 @@ In `cluster` mode, the driver runs on a different machine than the client, so `S
 
 Running Spark on YARN requires a binary distribution of Spark which is built with YARN support.
 Binary distributions can be downloaded from the [downloads page](https://spark.apache.org/downloads.html) of the project website.
+There are two variants of Spark binary distributions you can download. One is pre-built with a certain

Review comment:
       `pre-built with` -> `pre-built one with`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643006296






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642935312


   **[Test build #123867 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123867/testReport)** for PR 28788 at commit [`b5a5d5f`](https://github.com/apache/spark/commit/b5a5d5f6a657cdfda162993f009944a83d80849e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642288434


   **[Test build #123786 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123786/testReport)** for PR 28788 at commit [`86318d6`](https://github.com/apache/spark/commit/86318d6789706941b609dbe58d901c03e331b05e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642317548






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642459471


   **[Test build #123830 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123830/testReport)** for PR 28788 at commit [`7c9e1ad`](https://github.com/apache/spark/commit/7c9e1ad2bb12246d689e46794bd7851d8188a545).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643489568


   **[Test build #123943 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123943/testReport)** for PR 28788 at commit [`8fc906b`](https://github.com/apache/spark/commit/8fc906b52de3581caa45f2bd174521a6a610ff8a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643167018


   **[Test build #123898 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123898/testReport)** for PR 28788 at commit [`8fc906b`](https://github.com/apache/spark/commit/8fc906b52de3581caa45f2bd174521a6a610ff8a).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642413591






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-645799265






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-645798681






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643010535






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-645798134


   **[Test build #124200 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124200/testReport)** for PR 28788 at commit [`c224152`](https://github.com/apache/spark/commit/c2241528374cb64638204b21fd20a8c15cf931da).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643043694


   **[Test build #123872 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123872/testReport)** for PR 28788 at commit [`daffe6c`](https://github.com/apache/spark/commit/daffe6cc72be9b491def186c40baf58c8702eb1e).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643106480


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] holdenk commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

holdenk commented on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-643021841


   I’ll take a look at the decom test flakiness tomorrow/this weekend.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r440897397



##########
File path: docs/running-on-yarn.md
##########
@@ -82,6 +82,19 @@ In `cluster` mode, the driver runs on a different machine than the client, so `S
 
 Running Spark on YARN requires a binary distribution of Spark which is built with YARN support.
 Binary distributions can be downloaded from the [downloads page](https://spark.apache.org/downloads.html) of the project website.
+There are two variants of Spark binary distributions you can download. One is pre-built with a certain
+version of Apache Hadoop; this Spark distribution contains built-in Hadoop runtime, so we call it `with-hadoop` Spark
+distribution. The other one is pre-built with user-provided Hadoop; since this Spark distribution
+doesn't contain a built-in Hadoop runtime, it's smaller, but users have to provide a Hadoop installation separately.
+We call this variant `no-hadoop` Spark distribution. For `with-hadoop` Spark distribution, since
+it contains a built-in Hadoop runtime already, by default, when a job is submitted to Hadoop Yarn cluster, to prevent jar conflict, it will not
+populate Yarn's classpath into Spark. To override this behavior, you can set <code>spark.yarn.populateHadoopClasspath=true</code>.
+For `no-hadoop` Spark distribution, Spark will populate Yarn's classpath by default in order to get Hadoop runtime. Note that some features such
+as Hive support are not available in `no-hadoop` Spark distribution. For `with-hadoop` Spark distribution,

Review comment:
       ? This sentence is added at the last commit. I didn't say like that, @dbtsai . :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642946544






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r526668885



##########
File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala
##########
@@ -74,10 +76,11 @@ package object config {
     .doc("Whether to populate Hadoop classpath from `yarn.application.classpath` and " +
       "`mapreduce.application.classpath` Note that if this is set to `false`, it requires " +
       "a `with-Hadoop` Spark distribution that bundles Hadoop runtime or user has to provide " +
-      "a Hadoop installation separately.")
+      "a Hadoop installation separately. By default, for `with-hadoop` Spark distribution, " +
+      "this is set to `false`; for `no-hadoop` distribution, this is set to `true`.")
     .version("2.4.6")
     .booleanConf
-    .createWithDefault(true)
+    .createWithDefault(isHadoopProvided())

Review comment:
       Gentle ping.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28788:
URL: https://github.com/apache/spark/pull/28788#issuecomment-642994067


   **[Test build #123872 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123872/testReport)** for PR 28788 at commit [`daffe6c`](https://github.com/apache/spark/commit/daffe6cc72be9b491def186c40baf58c8702eb1e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org