You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2021/01/15 22:07:42 UTC

[spark] branch master updated: [SPARK-33212][BUILD] Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new b6f46ca  [SPARK-33212][BUILD] Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile
b6f46ca is described below

commit b6f46ca29742029efea2790af7fdefbc2fcf52de
Author: Chao Sun <su...@apple.com>
AuthorDate: Fri Jan 15 14:06:50 2021 -0800

    [SPARK-33212][BUILD] Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile
    
    ### What changes were proposed in this pull request?
    
    This:
    1. switches Spark to use shaded Hadoop clients, namely hadoop-client-api and hadoop-client-runtime, for Hadoop 3.x.
    2. upgrade built-in version for Hadoop 3.x to Hadoop 3.2.2
    
    Note that for Hadoop 2.7, we'll still use the same modules such as hadoop-client.
    
    In order to still keep default Hadoop profile to be hadoop-3.2, this defines the following Maven properties:
    
    ```
    hadoop-client-api.artifact
    hadoop-client-runtime.artifact
    hadoop-client-minicluster.artifact
    ```
    
    which default to:
    ```
    hadoop-client-api
    hadoop-client-runtime
    hadoop-client-minicluster
    ```
    but all switch to `hadoop-client` when the Hadoop profile is hadoop-2.7. A side affect from this is we'll import the same dependency multiple times. For this I have to disable Maven enforcer `banDuplicatePomDependencyVersions`.
    
    Besides above, there are the following changes:
    - explicitly add a few dependencies which are imported via transitive dependencies from Hadoop jars, but are removed from the shaded client jars.
    - removed the use of `ProxyUriUtils.getPath` from `ApplicationMaster` which is a server-side/private API.
    - modified `IsolatedClientLoader` to exclude `hadoop-auth` jars when Hadoop version is 3.x. This change should only matter when we're not sharing Hadoop classes with Spark (which is _mostly_ used in tests).
    
    ### Why are the changes needed?
    
    Hadoop 3.2.2 is released with new features and bug fixes, so it's good for the Spark community to adopt it. However, latest Hadoop versions starting from Hadoop 3.2.1 have upgraded to use Guava 27+. In order to resolve Guava conflicts, this takes the approach by switching to shaded client jars provided by Hadoop. This also has the benefits of avoid pulling other 3rd party dependencies from Hadoop side so as to avoid more potential future conflicts.
    
    ### Does this PR introduce _any_ user-facing change?
    
    When people use Spark with `hadoop-provided` option, they should make sure class path contains `hadoop-client-api` and `hadoop-client-runtime` jars. In addition, they may need to make sure these jars appear before other Hadoop jars in the order. Otherwise, classes may be loaded from the other non-shaded Hadoop jars and cause potential conflicts.
    
    ### How was this patch tested?
    
    Relying on existing tests.
    
    Closes #30701 from sunchao/test-hadoop-3.2.2.
    
    Authored-by: Chao Sun <su...@apple.com>
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
 common/network-yarn/pom.xml                        |  8 ++-
 core/pom.xml                                       | 16 +++++-
 dev/deps/spark-deps-hadoop-2.7-hive-2.3            |  3 +-
 dev/deps/spark-deps-hadoop-3.2-hive-2.3            | 53 +----------------
 external/kafka-0-10-assembly/pom.xml               |  8 ++-
 external/kafka-0-10-sql/pom.xml                    |  4 ++
 external/kafka-0-10-token-provider/pom.xml         |  5 ++
 external/kinesis-asl-assembly/pom.xml              |  8 ++-
 hadoop-cloud/pom.xml                               |  7 ++-
 launcher/pom.xml                                   |  9 ++-
 pom.xml                                            | 59 +++++++++++++++----
 resource-managers/kubernetes/core/pom.xml          |  9 ---
 resource-managers/yarn/pom.xml                     | 67 ++++++++++++++--------
 .../spark/deploy/yarn/ApplicationMaster.scala      |  6 +-
 .../spark/deploy/yarn/BaseYarnClusterSuite.scala   | 10 ++++
 sql/catalyst/pom.xml                               |  4 ++
 sql/hive/pom.xml                                   |  5 ++
 .../sql/hive/client/IsolatedClientLoader.scala     | 19 +++++-
 18 files changed, 191 insertions(+), 109 deletions(-)

diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 7aff79e..5036e05 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -65,7 +65,13 @@
     <!-- Provided dependencies -->
     <dependency>
       <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-client</artifactId>
+      <artifactId>${hadoop-client-api.artifact}</artifactId>
+      <version>${hadoop.version}</version>
+    </dependency>
+    <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>${hadoop-client-runtime.artifact}</artifactId>
+      <version>${hadoop.version}</version>
     </dependency>
     <dependency>
       <groupId>org.slf4j</groupId>
diff --git a/core/pom.xml b/core/pom.xml
index 09fa153..4b3e040 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -66,7 +66,13 @@
     </dependency>
     <dependency>
       <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-client</artifactId>
+      <artifactId>${hadoop-client-api.artifact}</artifactId>
+      <version>${hadoop.version}</version>
+    </dependency>
+    <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>${hadoop-client-runtime.artifact}</artifactId>
+      <version>${hadoop.version}</version>
     </dependency>
     <dependency>
       <groupId>org.apache.spark</groupId>
@@ -178,6 +184,14 @@
       <artifactId>commons-text</artifactId>
     </dependency>
     <dependency>
+      <groupId>commons-io</groupId>
+      <artifactId>commons-io</artifactId>
+    </dependency>
+    <dependency>
+      <groupId>commons-collections</groupId>
+      <artifactId>commons-collections</artifactId>
+    </dependency>
+    <dependency>
       <groupId>com.google.code.findbugs</groupId>
       <artifactId>jsr305</artifactId>
     </dependency>
diff --git a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 b/dev/deps/spark-deps-hadoop-2.7-hive-2.3
index 8d8ef2e..caede04 100644
--- a/dev/deps/spark-deps-hadoop-2.7-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-2.7-hive-2.3
@@ -128,7 +128,7 @@ javassist/3.25.0-GA//javassist-3.25.0-GA.jar
 javax.inject/1//javax.inject-1.jar
 javax.jdo/3.2.0-m3//javax.jdo-3.2.0-m3.jar
 javolution/5.5.1//javolution-5.5.1.jar
-jaxb-api/2.2.2//jaxb-api-2.2.2.jar
+jaxb-api/2.2.11//jaxb-api-2.2.11.jar
 jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar
 jcl-over-slf4j/1.7.30//jcl-over-slf4j-1.7.30.jar
 jdo-api/3.0.1//jdo-api-3.0.1.jar
@@ -227,7 +227,6 @@ spire-macros_2.12/0.17.0-M1//spire-macros_2.12-0.17.0-M1.jar
 spire-platform_2.12/0.17.0-M1//spire-platform_2.12-0.17.0-M1.jar
 spire-util_2.12/0.17.0-M1//spire-util_2.12-0.17.0-M1.jar
 spire_2.12/0.17.0-M1//spire_2.12-0.17.0-M1.jar
-stax-api/1.0-2//stax-api-1.0-2.jar
 stax-api/1.0.1//stax-api-1.0.1.jar
 stream/2.9.6//stream-2.9.6.jar
 super-csv/2.2.0//super-csv-2.2.0.jar
diff --git a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 b/dev/deps/spark-deps-hadoop-3.2-hive-2.3
index bf56fc1..344d8e5 100644
--- a/dev/deps/spark-deps-hadoop-3.2-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3.2-hive-2.3
@@ -3,7 +3,6 @@ JLargeArrays/1.5//JLargeArrays-1.5.jar
 JTransforms/3.1//JTransforms-3.1.jar
 RoaringBitmap/0.9.0//RoaringBitmap-0.9.0.jar
 ST4/4.0.4//ST4-4.0.4.jar
-accessors-smart/1.2//accessors-smart-1.2.jar
 activation/1.1.1//activation-1.1.1.jar
 aircompressor/0.16//aircompressor-0.16.jar
 algebra_2.12/2.0.0-M2//algebra_2.12-2.0.0-M2.jar
@@ -11,7 +10,6 @@ annotations/17.0.0//annotations-17.0.0.jar
 antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar
 antlr4-runtime/4.8-1//antlr4-runtime-4.8-1.jar
 aopalliance-repackaged/2.6.1//aopalliance-repackaged-2.6.1.jar
-aopalliance/1.0//aopalliance-1.0.jar
 arpack_combined_all/0.1//arpack_combined_all-0.1.jar
 arrow-format/2.0.0//arrow-format-2.0.0.jar
 arrow-memory-core/2.0.0//arrow-memory-core-2.0.0.jar
@@ -28,15 +26,12 @@ breeze_2.12/1.0//breeze_2.12-1.0.jar
 cats-kernel_2.12/2.0.0-M4//cats-kernel_2.12-2.0.0-M4.jar
 chill-java/0.9.5//chill-java-0.9.5.jar
 chill_2.12/0.9.5//chill_2.12-0.9.5.jar
-commons-beanutils/1.9.4//commons-beanutils-1.9.4.jar
 commons-cli/1.2//commons-cli-1.2.jar
 commons-codec/1.15//commons-codec-1.15.jar
 commons-collections/3.2.2//commons-collections-3.2.2.jar
 commons-compiler/3.0.16//commons-compiler-3.0.16.jar
 commons-compress/1.20//commons-compress-1.20.jar
-commons-configuration2/2.1.1//commons-configuration2-2.1.1.jar
 commons-crypto/1.1.0//commons-crypto-1.1.0.jar
-commons-daemon/1.0.13//commons-daemon-1.0.13.jar
 commons-dbcp/1.4//commons-dbcp-1.4.jar
 commons-httpclient/3.1//commons-httpclient-3.1.jar
 commons-io/2.5//commons-io-2.5.jar
@@ -56,30 +51,13 @@ datanucleus-api-jdo/4.2.4//datanucleus-api-jdo-4.2.4.jar
 datanucleus-core/4.1.17//datanucleus-core-4.1.17.jar
 datanucleus-rdbms/4.1.19//datanucleus-rdbms-4.1.19.jar
 derby/10.14.2.0//derby-10.14.2.0.jar
-dnsjava/2.1.7//dnsjava-2.1.7.jar
 dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
-ehcache/3.3.1//ehcache-3.3.1.jar
 flatbuffers-java/1.9.0//flatbuffers-java-1.9.0.jar
 generex/1.0.2//generex-1.0.2.jar
-geronimo-jcache_1.0_spec/1.0-alpha-1//geronimo-jcache_1.0_spec-1.0-alpha-1.jar
 gson/2.2.4//gson-2.2.4.jar
 guava/14.0.1//guava-14.0.1.jar
-guice-servlet/4.0//guice-servlet-4.0.jar
-guice/4.0//guice-4.0.jar
-hadoop-annotations/3.2.0//hadoop-annotations-3.2.0.jar
-hadoop-auth/3.2.0//hadoop-auth-3.2.0.jar
-hadoop-client/3.2.0//hadoop-client-3.2.0.jar
-hadoop-common/3.2.0//hadoop-common-3.2.0.jar
-hadoop-hdfs-client/3.2.0//hadoop-hdfs-client-3.2.0.jar
-hadoop-mapreduce-client-common/3.2.0//hadoop-mapreduce-client-common-3.2.0.jar
-hadoop-mapreduce-client-core/3.2.0//hadoop-mapreduce-client-core-3.2.0.jar
-hadoop-mapreduce-client-jobclient/3.2.0//hadoop-mapreduce-client-jobclient-3.2.0.jar
-hadoop-yarn-api/3.2.0//hadoop-yarn-api-3.2.0.jar
-hadoop-yarn-client/3.2.0//hadoop-yarn-client-3.2.0.jar
-hadoop-yarn-common/3.2.0//hadoop-yarn-common-3.2.0.jar
-hadoop-yarn-registry/3.2.0//hadoop-yarn-registry-3.2.0.jar
-hadoop-yarn-server-common/3.2.0//hadoop-yarn-server-common-3.2.0.jar
-hadoop-yarn-server-web-proxy/3.2.0//hadoop-yarn-server-web-proxy-3.2.0.jar
+hadoop-client-api/3.2.2//hadoop-client-api-3.2.2.jar
+hadoop-client-runtime/3.2.2//hadoop-client-runtime-3.2.2.jar
 hive-beeline/2.3.7//hive-beeline-2.3.7.jar
 hive-cli/2.3.7//hive-cli-2.3.7.jar
 hive-common/2.3.7//hive-common-2.3.7.jar
@@ -109,8 +87,6 @@ jackson-core/2.11.4//jackson-core-2.11.4.jar
 jackson-databind/2.11.4//jackson-databind-2.11.4.jar
 jackson-dataformat-yaml/2.11.4//jackson-dataformat-yaml-2.11.4.jar
 jackson-datatype-jsr310/2.11.2//jackson-datatype-jsr310-2.11.2.jar
-jackson-jaxrs-base/2.9.5//jackson-jaxrs-base-2.9.5.jar
-jackson-jaxrs-json-provider/2.9.5//jackson-jaxrs-json-provider-2.9.5.jar
 jackson-mapper-asl/1.9.13//jackson-mapper-asl-1.9.13.jar
 jackson-module-jaxb-annotations/2.11.4//jackson-module-jaxb-annotations-2.11.4.jar
 jackson-module-paranamer/2.11.4//jackson-module-paranamer-2.11.4.jar
@@ -124,13 +100,10 @@ jakarta.ws.rs-api/2.1.6//jakarta.ws.rs-api-2.1.6.jar
 jakarta.xml.bind-api/2.3.2//jakarta.xml.bind-api-2.3.2.jar
 janino/3.0.16//janino-3.0.16.jar
 javassist/3.25.0-GA//javassist-3.25.0-GA.jar
-javax.inject/1//javax.inject-1.jar
 javax.jdo/3.2.0-m3//javax.jdo-3.2.0-m3.jar
-javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar
 javolution/5.5.1//javolution-5.5.1.jar
 jaxb-api/2.2.11//jaxb-api-2.2.11.jar
 jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar
-jcip-annotations/1.0-1//jcip-annotations-1.0-1.jar
 jcl-over-slf4j/1.7.30//jcl-over-slf4j-1.7.30.jar
 jdo-api/3.0.1//jdo-api-3.0.1.jar
 jersey-client/2.30//jersey-client-2.30.jar
@@ -144,30 +117,14 @@ jline/2.14.6//jline-2.14.6.jar
 joda-time/2.10.5//joda-time-2.10.5.jar
 jodd-core/3.5.2//jodd-core-3.5.2.jar
 jpam/1.1//jpam-1.1.jar
-json-smart/2.3//json-smart-2.3.jar
 json/1.8//json-1.8.jar
 json4s-ast_2.12/3.7.0-M5//json4s-ast_2.12-3.7.0-M5.jar
 json4s-core_2.12/3.7.0-M5//json4s-core_2.12-3.7.0-M5.jar
 json4s-jackson_2.12/3.7.0-M5//json4s-jackson_2.12-3.7.0-M5.jar
 json4s-scalap_2.12/3.7.0-M5//json4s-scalap_2.12-3.7.0-M5.jar
-jsp-api/2.1//jsp-api-2.1.jar
 jsr305/3.0.0//jsr305-3.0.0.jar
 jta/1.1//jta-1.1.jar
 jul-to-slf4j/1.7.30//jul-to-slf4j-1.7.30.jar
-kerb-admin/1.0.1//kerb-admin-1.0.1.jar
-kerb-client/1.0.1//kerb-client-1.0.1.jar
-kerb-common/1.0.1//kerb-common-1.0.1.jar
-kerb-core/1.0.1//kerb-core-1.0.1.jar
-kerb-crypto/1.0.1//kerb-crypto-1.0.1.jar
-kerb-identity/1.0.1//kerb-identity-1.0.1.jar
-kerb-server/1.0.1//kerb-server-1.0.1.jar
-kerb-simplekdc/1.0.1//kerb-simplekdc-1.0.1.jar
-kerb-util/1.0.1//kerb-util-1.0.1.jar
-kerby-asn1/1.0.1//kerby-asn1-1.0.1.jar
-kerby-config/1.0.1//kerby-config-1.0.1.jar
-kerby-pkix/1.0.1//kerby-pkix-1.0.1.jar
-kerby-util/1.0.1//kerby-util-1.0.1.jar
-kerby-xdr/1.0.1//kerby-xdr-1.0.1.jar
 kryo-shaded/4.0.2//kryo-shaded-4.0.2.jar
 kubernetes-client/4.12.0//kubernetes-client-4.12.0.jar
 kubernetes-model-admissionregistration/4.12.0//kubernetes-model-admissionregistration-4.12.0.jar
@@ -205,9 +162,7 @@ metrics-json/4.1.1//metrics-json-4.1.1.jar
 metrics-jvm/4.1.1//metrics-jvm-4.1.1.jar
 minlog/1.3.0//minlog-1.3.0.jar
 netty-all/4.1.51.Final//netty-all-4.1.51.Final.jar
-nimbus-jose-jwt/4.41.1//nimbus-jose-jwt-4.41.1.jar
 objenesis/2.6//objenesis-2.6.jar
-okhttp/2.7.5//okhttp-2.7.5.jar
 okhttp/3.12.12//okhttp-3.12.12.jar
 okio/1.14.0//okio-1.14.0.jar
 opencsv/2.3//opencsv-2.3.jar
@@ -226,7 +181,6 @@ parquet-jackson/1.10.1//parquet-jackson-1.10.1.jar
 protobuf-java/2.5.0//protobuf-java-2.5.0.jar
 py4j/0.10.9.1//py4j-0.10.9.1.jar
 pyrolite/4.30//pyrolite-4.30.jar
-re2j/1.1//re2j-1.1.jar
 scala-collection-compat_2.12/2.1.1//scala-collection-compat_2.12-2.1.1.jar
 scala-compiler/2.12.10//scala-compiler-2.12.10.jar
 scala-library/2.12.10//scala-library-2.12.10.jar
@@ -244,15 +198,12 @@ spire-platform_2.12/0.17.0-M1//spire-platform_2.12-0.17.0-M1.jar
 spire-util_2.12/0.17.0-M1//spire-util_2.12-0.17.0-M1.jar
 spire_2.12/0.17.0-M1//spire_2.12-0.17.0-M1.jar
 stax-api/1.0.1//stax-api-1.0.1.jar
-stax2-api/3.1.4//stax2-api-3.1.4.jar
 stream/2.9.6//stream-2.9.6.jar
 super-csv/2.2.0//super-csv-2.2.0.jar
 threeten-extra/1.5.0//threeten-extra-1.5.0.jar
-token-provider/1.0.1//token-provider-1.0.1.jar
 transaction-api/1.1//transaction-api-1.1.jar
 univocity-parsers/2.9.0//univocity-parsers-2.9.0.jar
 velocity/1.5//velocity-1.5.jar
-woodstox-core/5.0.3//woodstox-core-5.0.3.jar
 xbean-asm7-shaded/4.15//xbean-asm7-shaded-4.15.jar
 xz/1.5//xz-1.5.jar
 zjsonpatch/0.3.0//zjsonpatch-0.3.0.jar
diff --git a/external/kafka-0-10-assembly/pom.xml b/external/kafka-0-10-assembly/pom.xml
index 2359e99..121bc56 100644
--- a/external/kafka-0-10-assembly/pom.xml
+++ b/external/kafka-0-10-assembly/pom.xml
@@ -71,10 +71,16 @@
     </dependency>
     <dependency>
       <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-client</artifactId>
+      <artifactId>${hadoop-client-api.artifact}</artifactId>
+      <version>${hadoop.version}</version>
       <scope>provided</scope>
     </dependency>
     <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>${hadoop-client-runtime.artifact}</artifactId>
+      <version>${hadoop.version}</version>
+    </dependency>
+    <dependency>
       <groupId>org.apache.avro</groupId>
       <artifactId>avro-mapred</artifactId>
       <classifier>${avro.mapred.classifier}</classifier>
diff --git a/external/kafka-0-10-sql/pom.xml b/external/kafka-0-10-sql/pom.xml
index 843f160..1833b35 100644
--- a/external/kafka-0-10-sql/pom.xml
+++ b/external/kafka-0-10-sql/pom.xml
@@ -80,6 +80,10 @@
       <version>${kafka.version}</version>
     </dependency>
     <dependency>
+      <groupId>com.google.code.findbugs</groupId>
+      <artifactId>jsr305</artifactId>
+    </dependency>
+    <dependency>
       <groupId>org.apache.commons</groupId>
       <artifactId>commons-pool2</artifactId>
       <version>${commons-pool2.version}</version>
diff --git a/external/kafka-0-10-token-provider/pom.xml b/external/kafka-0-10-token-provider/pom.xml
index dbe2ab9..4ee09fa 100644
--- a/external/kafka-0-10-token-provider/pom.xml
+++ b/external/kafka-0-10-token-provider/pom.xml
@@ -59,6 +59,11 @@
       <scope>test</scope>
     </dependency>
     <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>${hadoop-client-runtime.artifact}</artifactId>
+      <scope>${hadoop.deps.scope}</scope>
+    </dependency>
+    <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-tags_${scala.binary.version}</artifactId>
     </dependency>
diff --git a/external/kinesis-asl-assembly/pom.xml b/external/kinesis-asl-assembly/pom.xml
index 22259b0..9a98d7c 100644
--- a/external/kinesis-asl-assembly/pom.xml
+++ b/external/kinesis-asl-assembly/pom.xml
@@ -91,10 +91,16 @@
     </dependency>
     <dependency>
       <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-client</artifactId>
+      <artifactId>${hadoop-client-api.artifact}</artifactId>
+      <version>${hadoop.version}</version>
       <scope>provided</scope>
     </dependency>
     <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>${hadoop-client-runtime.artifact}</artifactId>
+      <version>${hadoop.version}</version>
+    </dependency>
+    <dependency>
       <groupId>org.apache.avro</groupId>
       <artifactId>avro-ipc</artifactId>
       <scope>provided</scope>
diff --git a/hadoop-cloud/pom.xml b/hadoop-cloud/pom.xml
index 03910ba..c0997e5 100644
--- a/hadoop-cloud/pom.xml
+++ b/hadoop-cloud/pom.xml
@@ -58,10 +58,15 @@
     </dependency>
     <dependency>
       <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-client</artifactId>
+      <artifactId>${hadoop-client-api.artifact}</artifactId>
       <version>${hadoop.version}</version>
       <scope>provided</scope>
     </dependency>
+    <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>${hadoop-client-runtime.artifact}</artifactId>
+      <version>${hadoop.version}</version>
+    </dependency>
     <!--
       the AWS module pulls in jackson; its transitive dependencies can create
       intra-jackson-module version problems.
diff --git a/launcher/pom.xml b/launcher/pom.xml
index 5da2a49..dd872f4 100644
--- a/launcher/pom.xml
+++ b/launcher/pom.xml
@@ -81,7 +81,14 @@
     <!-- Not needed by the test code, but referenced by SparkSubmit which is used by the tests. -->
     <dependency>
       <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-client</artifactId>
+      <artifactId>${hadoop-client-api.artifact}</artifactId>
+      <version>${hadoop.version}</version>
+      <scope>test</scope>
+    </dependency>
+    <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>${hadoop-client-runtime.artifact}</artifactId>
+      <version>${hadoop.version}</version>
       <scope>test</scope>
     </dependency>
   </dependencies>
diff --git a/pom.xml b/pom.xml
index f921e35..26b5186 100644
--- a/pom.xml
+++ b/pom.xml
@@ -120,7 +120,7 @@
     <sbt.project.name>spark</sbt.project.name>
     <slf4j.version>1.7.30</slf4j.version>
     <log4j.version>1.2.17</log4j.version>
-    <hadoop.version>3.2.0</hadoop.version>
+    <hadoop.version>3.2.2</hadoop.version>
     <protobuf.version>2.5.0</protobuf.version>
     <yarn.version>${hadoop.version}</yarn.version>
     <zookeeper.version>3.4.14</zookeeper.version>
@@ -246,6 +246,15 @@
     <parquet.test.deps.scope>test</parquet.test.deps.scope>
 
     <!--
+      These default to Hadoop 3.x shaded client/minicluster jars, but are switched to hadoop-client
+      when the Hadoop profile is hadoop-2.7, because these are only available in 3.x. Note that,
+      as result we have to include the same hadoop-client dependency multiple times in hadoop-2.7.
+    -->
+    <hadoop-client-api.artifact>hadoop-client-api</hadoop-client-api.artifact>
+    <hadoop-client-runtime.artifact>hadoop-client-runtime</hadoop-client-runtime.artifact>
+    <hadoop-client-minicluster.artifact>hadoop-client-minicluster</hadoop-client-minicluster.artifact>
+
+    <!--
       Overridable test home. So that you can call individual pom files directly without
       things breaking.
     -->
@@ -860,6 +869,11 @@
         <version>2.0.1</version>
       </dependency>
       <dependency>
+        <groupId>javax.xml.bind</groupId>
+        <artifactId>jaxb-api</artifactId>
+        <version>2.2.11</version>
+      </dependency>
+      <dependency>
         <groupId>org.scalanlp</groupId>
         <artifactId>breeze_${scala.binary.version}</artifactId>
         <version>1.0</version>
@@ -1067,6 +1081,26 @@
         <version>${curator.version}</version>
         <scope>test</scope>
       </dependency>
+      <!-- Hadoop 3.x dependencies -->
+      <dependency>
+        <groupId>org.apache.hadoop</groupId>
+        <artifactId>hadoop-client-api</artifactId>
+        <version>${hadoop.version}</version>
+        <scope>${hadoop.deps.scope}</scope>
+      </dependency>
+      <dependency>
+        <groupId>org.apache.hadoop</groupId>
+        <artifactId>hadoop-client-runtime</artifactId>
+        <version>${hadoop.version}</version>
+        <scope>${hadoop.deps.scope}</scope>
+      </dependency>
+      <dependency>
+        <groupId>org.apache.hadoop</groupId>
+        <artifactId>hadoop-client-minicluster</artifactId>
+        <version>${yarn.version}</version>
+        <scope>test</scope>
+      </dependency>
+      <!-- End of Hadoop 3.x dependencies -->
       <dependency>
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-client</artifactId>
@@ -1657,6 +1691,14 @@
             <artifactId>ant</artifactId>
           </exclusion>
           <exclusion>
+            <groupId>org.apache.hadoop</groupId>
+            <artifactId>hadoop-common</artifactId>
+          </exclusion>
+          <exclusion>
+            <groupId>org.apache.hadoop</groupId>
+            <artifactId>hadoop-auth</artifactId>
+          </exclusion>
+          <exclusion>
             <groupId>org.apache.zookeeper</groupId>
             <artifactId>zookeeper</artifactId>
           </exclusion>
@@ -2420,17 +2462,6 @@
                 </rules>
               </configuration>
             </execution>
-            <execution>
-              <id>enforce-no-duplicate-dependencies</id>
-              <goals>
-                <goal>enforce</goal>
-              </goals>
-              <configuration>
-                <rules>
-                  <banDuplicatePomDependencyVersions/>
-                </rules>
-              </configuration>
-            </execution>
           </executions>
         </plugin>
 	<plugin>
@@ -2901,6 +2932,7 @@
         <artifactId>maven-shade-plugin</artifactId>
         <configuration>
           <shadedArtifactAttached>false</shadedArtifactAttached>
+          <createDependencyReducedPom>false</createDependencyReducedPom>
           <artifactSet>
             <includes>
               <include>org.spark-project.spark:unused</include>
@@ -3162,6 +3194,9 @@
         <hadoop.version>2.7.4</hadoop.version>
         <curator.version>2.7.1</curator.version>
         <commons-io.version>2.4</commons-io.version>
+        <hadoop-client-api.artifact>hadoop-client</hadoop-client-api.artifact>
+        <hadoop-client-runtime.artifact>hadoop-client</hadoop-client-runtime.artifact>
+        <hadoop-client-minicluster.artifact>hadoop-client</hadoop-client-minicluster.artifact>
       </properties>
     </profile>
 
diff --git a/resource-managers/kubernetes/core/pom.xml b/resource-managers/kubernetes/core/pom.xml
index 44df4e1..3fff940 100644
--- a/resource-managers/kubernetes/core/pom.xml
+++ b/resource-managers/kubernetes/core/pom.xml
@@ -64,10 +64,6 @@
           <artifactId>*</artifactId>
         </exclusion>
         <exclusion>
-          <groupId>com.fasterxml.jackson.module</groupId>
-          <artifactId>jackson-module-jaxb-annotations</artifactId>
-        </exclusion>
-        <exclusion>
           <groupId>com.fasterxml.jackson.dataformat</groupId>
           <artifactId>jackson-dataformat-yaml</artifactId>
         </exclusion>
@@ -85,11 +81,6 @@
       <artifactId>jackson-dataformat-yaml</artifactId>
       <version>${fasterxml.jackson.version}</version>
     </dependency>
-    <dependency>
-      <groupId>com.fasterxml.jackson.module</groupId>
-      <artifactId>jackson-module-jaxb-annotations</artifactId>
-      <version>${fasterxml.jackson.version}</version>
-    </dependency>
 
     <!-- Explicitly depend on shaded dependencies from the parent, since shaded deps aren't transitive -->
     <dependency>
diff --git a/resource-managers/yarn/pom.xml b/resource-managers/yarn/pom.xml
index c0ce1c8..a662953 100644
--- a/resource-managers/yarn/pom.xml
+++ b/resource-managers/yarn/pom.xml
@@ -40,6 +40,42 @@
         <spark.yarn.isHadoopProvided>true</spark.yarn.isHadoopProvided>
       </properties>
     </profile>
+    <profile>
+      <id>hadoop-2.7</id>
+      <dependencies>
+        <dependency>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-yarn-api</artifactId>
+        </dependency>
+        <dependency>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-yarn-common</artifactId>
+        </dependency>
+        <dependency>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-yarn-server-web-proxy</artifactId>
+        </dependency>
+        <dependency>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-yarn-client</artifactId>
+        </dependency>
+        <dependency>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-yarn-server-tests</artifactId>
+          <classifier>tests</classifier>
+          <scope>test</scope>
+        </dependency>
+        <!--
+          Hack to exclude org.apache.hadoop:hadoop-yarn-server-resourcemanager:jar:tests.
+          See the parent pom.xml for more details.
+        -->
+        <dependency>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+          <scope>test</scope>
+        </dependency>
+      </dependencies>
+    </profile>
   </profiles>
 
   <dependencies>
@@ -69,23 +105,20 @@
     </dependency>
     <dependency>
       <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-yarn-api</artifactId>
-    </dependency>
-    <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-yarn-common</artifactId>
+      <artifactId>${hadoop-client-api.artifact}</artifactId>
+      <version>${hadoop.version}</version>
     </dependency>
     <dependency>
       <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-yarn-server-web-proxy</artifactId>
+      <artifactId>${hadoop-client-runtime.artifact}</artifactId>
+      <version>${hadoop.version}</version>
+      <scope>${hadoop.deps.scope}</scope>
     </dependency>
     <dependency>
       <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-yarn-client</artifactId>
-    </dependency>
-    <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-client</artifactId>
+      <artifactId>${hadoop-client-minicluster.artifact}</artifactId>
+      <version>${hadoop.version}</version>
+      <scope>test</scope>
     </dependency>
 
     <!-- Explicit listing of transitive deps that are shaded. Otherwise, odd compiler crashes. -->
@@ -136,18 +169,6 @@
     </dependency>
 
     <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-yarn-server-tests</artifactId>
-      <classifier>tests</classifier>
-      <scope>test</scope>
-    </dependency>
-    <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-      <scope>test</scope>
-    </dependency>
-
-    <dependency>
       <groupId>org.mockito</groupId>
       <artifactId>mockito-core</artifactId>
       <scope>test</scope>
diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
index ab69507..eb927a3 100644
--- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
+++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
@@ -19,7 +19,7 @@ package org.apache.spark.deploy.yarn
 
 import java.io.{File, IOException}
 import java.lang.reflect.{InvocationTargetException, Modifier}
-import java.net.{URI, URL}
+import java.net.{URI, URL, URLEncoder}
 import java.security.PrivilegedExceptionAction
 import java.util.concurrent.{TimeoutException, TimeUnit}
 
@@ -36,7 +36,6 @@ import org.apache.hadoop.yarn.api._
 import org.apache.hadoop.yarn.api.records._
 import org.apache.hadoop.yarn.conf.YarnConfiguration
 import org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException
-import org.apache.hadoop.yarn.server.webproxy.ProxyUriUtils
 import org.apache.hadoop.yarn.util.{ConverterUtils, Records}
 
 import org.apache.spark._
@@ -308,7 +307,8 @@ private[spark] class ApplicationMaster(
       // The client-mode AM doesn't listen for incoming connections, so report an invalid port.
       registerAM(Utils.localHostName, -1, sparkConf,
         sparkConf.getOption("spark.driver.appUIAddress"), appAttemptId)
-      addAmIpFilter(Some(driverRef), ProxyUriUtils.getPath(appAttemptId.getApplicationId))
+      val encodedAppId = URLEncoder.encode(appAttemptId.getApplicationId.toString, "UTF-8")
+      addAmIpFilter(Some(driverRef), s"/proxy/$encodedAppId")
       createAllocator(driverRef, sparkConf, clientRpcEnv, appAttemptId, cachedResourcesConf)
       reporterThread.join()
     } catch {
diff --git a/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala b/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala
index 20f5339..a813b99 100644
--- a/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala
+++ b/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala
@@ -80,6 +80,16 @@ abstract class BaseYarnClusterSuite
     yarnConf.set("yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage",
       "100.0")
 
+    // capacity-scheduler.xml is missing in hadoop-client-minicluster so this is a workaround
+    yarnConf.set("yarn.scheduler.capacity.root.queues", "default")
+    yarnConf.setInt("yarn.scheduler.capacity.root.default.capacity", 100)
+    yarnConf.setFloat("yarn.scheduler.capacity.root.default.user-limit-factor", 1)
+    yarnConf.setInt("yarn.scheduler.capacity.root.default.maximum-capacity", 100)
+    yarnConf.set("yarn.scheduler.capacity.root.default.state", "RUNNING")
+    yarnConf.set("yarn.scheduler.capacity.root.default.acl_submit_applications", "*")
+    yarnConf.set("yarn.scheduler.capacity.root.default.acl_administer_queue", "*")
+    yarnConf.setInt("yarn.scheduler.capacity.node-locality-delay", -1)
+
     yarnCluster = new MiniYARNCluster(getClass().getName(), 1, 1, 1)
     yarnCluster.init(yarnConf)
     yarnCluster.start()
diff --git a/sql/catalyst/pom.xml b/sql/catalyst/pom.xml
index 0553438..583738b 100644
--- a/sql/catalyst/pom.xml
+++ b/sql/catalyst/pom.xml
@@ -105,6 +105,10 @@
       <artifactId>antlr4-runtime</artifactId>
     </dependency>
     <dependency>
+      <groupId>javax.xml.bind</groupId>
+      <artifactId>jaxb-api</artifactId>
+    </dependency>
+    <dependency>
       <groupId>commons-codec</groupId>
       <artifactId>commons-codec</artifactId>
     </dependency>
diff --git a/sql/hive/pom.xml b/sql/hive/pom.xml
index 27d2756..74b1f9d 100644
--- a/sql/hive/pom.xml
+++ b/sql/hive/pom.xml
@@ -163,6 +163,11 @@
       <artifactId>datanucleus-core</artifactId>
     </dependency>
     <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>${hadoop-client-runtime.artifact}</artifactId>
+      <scope>${hadoop.deps.scope}</scope>
+    </dependency>
+    <dependency>
       <groupId>org.apache.thrift</groupId>
       <artifactId>libthrift</artifactId>
     </dependency>
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
index 02bf865..4e5e58d 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
@@ -112,11 +112,24 @@ private[hive] object IsolatedClientLoader extends Logging {
       hadoopVersion: String,
       ivyPath: Option[String],
       remoteRepos: String): Seq[URL] = {
+    val hadoopJarNames = if (hadoopVersion.startsWith("3")) {
+      Seq(s"org.apache.hadoop:hadoop-client-api:$hadoopVersion",
+        s"org.apache.hadoop:hadoop-client-runtime:$hadoopVersion")
+    } else {
+      Seq(s"org.apache.hadoop:hadoop-client:$hadoopVersion")
+    }
     val hiveArtifacts = version.extraDeps ++
       Seq("hive-metastore", "hive-exec", "hive-common", "hive-serde")
         .map(a => s"org.apache.hive:$a:${version.fullVersion}") ++
-      Seq("com.google.guava:guava:14.0.1",
-        s"org.apache.hadoop:hadoop-client:$hadoopVersion")
+      Seq("com.google.guava:guava:14.0.1") ++ hadoopJarNames
+
+    val extraExclusions = if (hadoopVersion.startsWith("3")) {
+      // this introduced from lower version of Hive could conflict with jars in Hadoop 3.2+, so
+      // exclude here in favor of the ones in Hadoop 3.2+
+      Seq("org.apache.hadoop:hadoop-auth")
+    } else {
+      Seq.empty
+    }
 
     val classpaths = quietly {
       SparkSubmitUtils.resolveMavenCoordinates(
@@ -125,7 +138,7 @@ private[hive] object IsolatedClientLoader extends Logging {
           Some(remoteRepos),
           ivyPath),
         transitive = true,
-        exclusions = version.exclusions)
+        exclusions = version.exclusions ++ extraExclusions)
     }
     val allFiles = classpaths.map(new File(_)).toSet
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org