You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2021/01/15 22:07:42 UTC
[spark] branch master updated: [SPARK-33212][BUILD] Upgrade to
Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile
This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new b6f46ca [SPARK-33212][BUILD] Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile
b6f46ca is described below
commit b6f46ca29742029efea2790af7fdefbc2fcf52de
Author: Chao Sun <su...@apple.com>
AuthorDate: Fri Jan 15 14:06:50 2021 -0800
[SPARK-33212][BUILD] Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile
### What changes were proposed in this pull request?
This:
1. switches Spark to use shaded Hadoop clients, namely hadoop-client-api and hadoop-client-runtime, for Hadoop 3.x.
2. upgrade built-in version for Hadoop 3.x to Hadoop 3.2.2
Note that for Hadoop 2.7, we'll still use the same modules such as hadoop-client.
In order to still keep default Hadoop profile to be hadoop-3.2, this defines the following Maven properties:
```
hadoop-client-api.artifact
hadoop-client-runtime.artifact
hadoop-client-minicluster.artifact
```
which default to:
```
hadoop-client-api
hadoop-client-runtime
hadoop-client-minicluster
```
but all switch to `hadoop-client` when the Hadoop profile is hadoop-2.7. A side affect from this is we'll import the same dependency multiple times. For this I have to disable Maven enforcer `banDuplicatePomDependencyVersions`.
Besides above, there are the following changes:
- explicitly add a few dependencies which are imported via transitive dependencies from Hadoop jars, but are removed from the shaded client jars.
- removed the use of `ProxyUriUtils.getPath` from `ApplicationMaster` which is a server-side/private API.
- modified `IsolatedClientLoader` to exclude `hadoop-auth` jars when Hadoop version is 3.x. This change should only matter when we're not sharing Hadoop classes with Spark (which is _mostly_ used in tests).
### Why are the changes needed?
Hadoop 3.2.2 is released with new features and bug fixes, so it's good for the Spark community to adopt it. However, latest Hadoop versions starting from Hadoop 3.2.1 have upgraded to use Guava 27+. In order to resolve Guava conflicts, this takes the approach by switching to shaded client jars provided by Hadoop. This also has the benefits of avoid pulling other 3rd party dependencies from Hadoop side so as to avoid more potential future conflicts.
### Does this PR introduce _any_ user-facing change?
When people use Spark with `hadoop-provided` option, they should make sure class path contains `hadoop-client-api` and `hadoop-client-runtime` jars. In addition, they may need to make sure these jars appear before other Hadoop jars in the order. Otherwise, classes may be loaded from the other non-shaded Hadoop jars and cause potential conflicts.
### How was this patch tested?
Relying on existing tests.
Closes #30701 from sunchao/test-hadoop-3.2.2.
Authored-by: Chao Sun <su...@apple.com>
Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
common/network-yarn/pom.xml | 8 ++-
core/pom.xml | 16 +++++-
dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 3 +-
dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 53 +----------------
external/kafka-0-10-assembly/pom.xml | 8 ++-
external/kafka-0-10-sql/pom.xml | 4 ++
external/kafka-0-10-token-provider/pom.xml | 5 ++
external/kinesis-asl-assembly/pom.xml | 8 ++-
hadoop-cloud/pom.xml | 7 ++-
launcher/pom.xml | 9 ++-
pom.xml | 59 +++++++++++++++----
resource-managers/kubernetes/core/pom.xml | 9 ---
resource-managers/yarn/pom.xml | 67 ++++++++++++++--------
.../spark/deploy/yarn/ApplicationMaster.scala | 6 +-
.../spark/deploy/yarn/BaseYarnClusterSuite.scala | 10 ++++
sql/catalyst/pom.xml | 4 ++
sql/hive/pom.xml | 5 ++
.../sql/hive/client/IsolatedClientLoader.scala | 19 +++++-
18 files changed, 191 insertions(+), 109 deletions(-)
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 7aff79e..5036e05 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -65,7 +65,13 @@
<!-- Provided dependencies -->
<dependency>
<groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-client</artifactId>
+ <artifactId>${hadoop-client-api.artifact}</artifactId>
+ <version>${hadoop.version}</version>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>${hadoop-client-runtime.artifact}</artifactId>
+ <version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
diff --git a/core/pom.xml b/core/pom.xml
index 09fa153..4b3e040 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -66,7 +66,13 @@
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-client</artifactId>
+ <artifactId>${hadoop-client-api.artifact}</artifactId>
+ <version>${hadoop.version}</version>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>${hadoop-client-runtime.artifact}</artifactId>
+ <version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
@@ -178,6 +184,14 @@
<artifactId>commons-text</artifactId>
</dependency>
<dependency>
+ <groupId>commons-io</groupId>
+ <artifactId>commons-io</artifactId>
+ </dependency>
+ <dependency>
+ <groupId>commons-collections</groupId>
+ <artifactId>commons-collections</artifactId>
+ </dependency>
+ <dependency>
<groupId>com.google.code.findbugs</groupId>
<artifactId>jsr305</artifactId>
</dependency>
diff --git a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 b/dev/deps/spark-deps-hadoop-2.7-hive-2.3
index 8d8ef2e..caede04 100644
--- a/dev/deps/spark-deps-hadoop-2.7-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-2.7-hive-2.3
@@ -128,7 +128,7 @@ javassist/3.25.0-GA//javassist-3.25.0-GA.jar
javax.inject/1//javax.inject-1.jar
javax.jdo/3.2.0-m3//javax.jdo-3.2.0-m3.jar
javolution/5.5.1//javolution-5.5.1.jar
-jaxb-api/2.2.2//jaxb-api-2.2.2.jar
+jaxb-api/2.2.11//jaxb-api-2.2.11.jar
jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar
jcl-over-slf4j/1.7.30//jcl-over-slf4j-1.7.30.jar
jdo-api/3.0.1//jdo-api-3.0.1.jar
@@ -227,7 +227,6 @@ spire-macros_2.12/0.17.0-M1//spire-macros_2.12-0.17.0-M1.jar
spire-platform_2.12/0.17.0-M1//spire-platform_2.12-0.17.0-M1.jar
spire-util_2.12/0.17.0-M1//spire-util_2.12-0.17.0-M1.jar
spire_2.12/0.17.0-M1//spire_2.12-0.17.0-M1.jar
-stax-api/1.0-2//stax-api-1.0-2.jar
stax-api/1.0.1//stax-api-1.0.1.jar
stream/2.9.6//stream-2.9.6.jar
super-csv/2.2.0//super-csv-2.2.0.jar
diff --git a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 b/dev/deps/spark-deps-hadoop-3.2-hive-2.3
index bf56fc1..344d8e5 100644
--- a/dev/deps/spark-deps-hadoop-3.2-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3.2-hive-2.3
@@ -3,7 +3,6 @@ JLargeArrays/1.5//JLargeArrays-1.5.jar
JTransforms/3.1//JTransforms-3.1.jar
RoaringBitmap/0.9.0//RoaringBitmap-0.9.0.jar
ST4/4.0.4//ST4-4.0.4.jar
-accessors-smart/1.2//accessors-smart-1.2.jar
activation/1.1.1//activation-1.1.1.jar
aircompressor/0.16//aircompressor-0.16.jar
algebra_2.12/2.0.0-M2//algebra_2.12-2.0.0-M2.jar
@@ -11,7 +10,6 @@ annotations/17.0.0//annotations-17.0.0.jar
antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar
antlr4-runtime/4.8-1//antlr4-runtime-4.8-1.jar
aopalliance-repackaged/2.6.1//aopalliance-repackaged-2.6.1.jar
-aopalliance/1.0//aopalliance-1.0.jar
arpack_combined_all/0.1//arpack_combined_all-0.1.jar
arrow-format/2.0.0//arrow-format-2.0.0.jar
arrow-memory-core/2.0.0//arrow-memory-core-2.0.0.jar
@@ -28,15 +26,12 @@ breeze_2.12/1.0//breeze_2.12-1.0.jar
cats-kernel_2.12/2.0.0-M4//cats-kernel_2.12-2.0.0-M4.jar
chill-java/0.9.5//chill-java-0.9.5.jar
chill_2.12/0.9.5//chill_2.12-0.9.5.jar
-commons-beanutils/1.9.4//commons-beanutils-1.9.4.jar
commons-cli/1.2//commons-cli-1.2.jar
commons-codec/1.15//commons-codec-1.15.jar
commons-collections/3.2.2//commons-collections-3.2.2.jar
commons-compiler/3.0.16//commons-compiler-3.0.16.jar
commons-compress/1.20//commons-compress-1.20.jar
-commons-configuration2/2.1.1//commons-configuration2-2.1.1.jar
commons-crypto/1.1.0//commons-crypto-1.1.0.jar
-commons-daemon/1.0.13//commons-daemon-1.0.13.jar
commons-dbcp/1.4//commons-dbcp-1.4.jar
commons-httpclient/3.1//commons-httpclient-3.1.jar
commons-io/2.5//commons-io-2.5.jar
@@ -56,30 +51,13 @@ datanucleus-api-jdo/4.2.4//datanucleus-api-jdo-4.2.4.jar
datanucleus-core/4.1.17//datanucleus-core-4.1.17.jar
datanucleus-rdbms/4.1.19//datanucleus-rdbms-4.1.19.jar
derby/10.14.2.0//derby-10.14.2.0.jar
-dnsjava/2.1.7//dnsjava-2.1.7.jar
dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
-ehcache/3.3.1//ehcache-3.3.1.jar
flatbuffers-java/1.9.0//flatbuffers-java-1.9.0.jar
generex/1.0.2//generex-1.0.2.jar
-geronimo-jcache_1.0_spec/1.0-alpha-1//geronimo-jcache_1.0_spec-1.0-alpha-1.jar
gson/2.2.4//gson-2.2.4.jar
guava/14.0.1//guava-14.0.1.jar
-guice-servlet/4.0//guice-servlet-4.0.jar
-guice/4.0//guice-4.0.jar
-hadoop-annotations/3.2.0//hadoop-annotations-3.2.0.jar
-hadoop-auth/3.2.0//hadoop-auth-3.2.0.jar
-hadoop-client/3.2.0//hadoop-client-3.2.0.jar
-hadoop-common/3.2.0//hadoop-common-3.2.0.jar
-hadoop-hdfs-client/3.2.0//hadoop-hdfs-client-3.2.0.jar
-hadoop-mapreduce-client-common/3.2.0//hadoop-mapreduce-client-common-3.2.0.jar
-hadoop-mapreduce-client-core/3.2.0//hadoop-mapreduce-client-core-3.2.0.jar
-hadoop-mapreduce-client-jobclient/3.2.0//hadoop-mapreduce-client-jobclient-3.2.0.jar
-hadoop-yarn-api/3.2.0//hadoop-yarn-api-3.2.0.jar
-hadoop-yarn-client/3.2.0//hadoop-yarn-client-3.2.0.jar
-hadoop-yarn-common/3.2.0//hadoop-yarn-common-3.2.0.jar
-hadoop-yarn-registry/3.2.0//hadoop-yarn-registry-3.2.0.jar
-hadoop-yarn-server-common/3.2.0//hadoop-yarn-server-common-3.2.0.jar
-hadoop-yarn-server-web-proxy/3.2.0//hadoop-yarn-server-web-proxy-3.2.0.jar
+hadoop-client-api/3.2.2//hadoop-client-api-3.2.2.jar
+hadoop-client-runtime/3.2.2//hadoop-client-runtime-3.2.2.jar
hive-beeline/2.3.7//hive-beeline-2.3.7.jar
hive-cli/2.3.7//hive-cli-2.3.7.jar
hive-common/2.3.7//hive-common-2.3.7.jar
@@ -109,8 +87,6 @@ jackson-core/2.11.4//jackson-core-2.11.4.jar
jackson-databind/2.11.4//jackson-databind-2.11.4.jar
jackson-dataformat-yaml/2.11.4//jackson-dataformat-yaml-2.11.4.jar
jackson-datatype-jsr310/2.11.2//jackson-datatype-jsr310-2.11.2.jar
-jackson-jaxrs-base/2.9.5//jackson-jaxrs-base-2.9.5.jar
-jackson-jaxrs-json-provider/2.9.5//jackson-jaxrs-json-provider-2.9.5.jar
jackson-mapper-asl/1.9.13//jackson-mapper-asl-1.9.13.jar
jackson-module-jaxb-annotations/2.11.4//jackson-module-jaxb-annotations-2.11.4.jar
jackson-module-paranamer/2.11.4//jackson-module-paranamer-2.11.4.jar
@@ -124,13 +100,10 @@ jakarta.ws.rs-api/2.1.6//jakarta.ws.rs-api-2.1.6.jar
jakarta.xml.bind-api/2.3.2//jakarta.xml.bind-api-2.3.2.jar
janino/3.0.16//janino-3.0.16.jar
javassist/3.25.0-GA//javassist-3.25.0-GA.jar
-javax.inject/1//javax.inject-1.jar
javax.jdo/3.2.0-m3//javax.jdo-3.2.0-m3.jar
-javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar
javolution/5.5.1//javolution-5.5.1.jar
jaxb-api/2.2.11//jaxb-api-2.2.11.jar
jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar
-jcip-annotations/1.0-1//jcip-annotations-1.0-1.jar
jcl-over-slf4j/1.7.30//jcl-over-slf4j-1.7.30.jar
jdo-api/3.0.1//jdo-api-3.0.1.jar
jersey-client/2.30//jersey-client-2.30.jar
@@ -144,30 +117,14 @@ jline/2.14.6//jline-2.14.6.jar
joda-time/2.10.5//joda-time-2.10.5.jar
jodd-core/3.5.2//jodd-core-3.5.2.jar
jpam/1.1//jpam-1.1.jar
-json-smart/2.3//json-smart-2.3.jar
json/1.8//json-1.8.jar
json4s-ast_2.12/3.7.0-M5//json4s-ast_2.12-3.7.0-M5.jar
json4s-core_2.12/3.7.0-M5//json4s-core_2.12-3.7.0-M5.jar
json4s-jackson_2.12/3.7.0-M5//json4s-jackson_2.12-3.7.0-M5.jar
json4s-scalap_2.12/3.7.0-M5//json4s-scalap_2.12-3.7.0-M5.jar
-jsp-api/2.1//jsp-api-2.1.jar
jsr305/3.0.0//jsr305-3.0.0.jar
jta/1.1//jta-1.1.jar
jul-to-slf4j/1.7.30//jul-to-slf4j-1.7.30.jar
-kerb-admin/1.0.1//kerb-admin-1.0.1.jar
-kerb-client/1.0.1//kerb-client-1.0.1.jar
-kerb-common/1.0.1//kerb-common-1.0.1.jar
-kerb-core/1.0.1//kerb-core-1.0.1.jar
-kerb-crypto/1.0.1//kerb-crypto-1.0.1.jar
-kerb-identity/1.0.1//kerb-identity-1.0.1.jar
-kerb-server/1.0.1//kerb-server-1.0.1.jar
-kerb-simplekdc/1.0.1//kerb-simplekdc-1.0.1.jar
-kerb-util/1.0.1//kerb-util-1.0.1.jar
-kerby-asn1/1.0.1//kerby-asn1-1.0.1.jar
-kerby-config/1.0.1//kerby-config-1.0.1.jar
-kerby-pkix/1.0.1//kerby-pkix-1.0.1.jar
-kerby-util/1.0.1//kerby-util-1.0.1.jar
-kerby-xdr/1.0.1//kerby-xdr-1.0.1.jar
kryo-shaded/4.0.2//kryo-shaded-4.0.2.jar
kubernetes-client/4.12.0//kubernetes-client-4.12.0.jar
kubernetes-model-admissionregistration/4.12.0//kubernetes-model-admissionregistration-4.12.0.jar
@@ -205,9 +162,7 @@ metrics-json/4.1.1//metrics-json-4.1.1.jar
metrics-jvm/4.1.1//metrics-jvm-4.1.1.jar
minlog/1.3.0//minlog-1.3.0.jar
netty-all/4.1.51.Final//netty-all-4.1.51.Final.jar
-nimbus-jose-jwt/4.41.1//nimbus-jose-jwt-4.41.1.jar
objenesis/2.6//objenesis-2.6.jar
-okhttp/2.7.5//okhttp-2.7.5.jar
okhttp/3.12.12//okhttp-3.12.12.jar
okio/1.14.0//okio-1.14.0.jar
opencsv/2.3//opencsv-2.3.jar
@@ -226,7 +181,6 @@ parquet-jackson/1.10.1//parquet-jackson-1.10.1.jar
protobuf-java/2.5.0//protobuf-java-2.5.0.jar
py4j/0.10.9.1//py4j-0.10.9.1.jar
pyrolite/4.30//pyrolite-4.30.jar
-re2j/1.1//re2j-1.1.jar
scala-collection-compat_2.12/2.1.1//scala-collection-compat_2.12-2.1.1.jar
scala-compiler/2.12.10//scala-compiler-2.12.10.jar
scala-library/2.12.10//scala-library-2.12.10.jar
@@ -244,15 +198,12 @@ spire-platform_2.12/0.17.0-M1//spire-platform_2.12-0.17.0-M1.jar
spire-util_2.12/0.17.0-M1//spire-util_2.12-0.17.0-M1.jar
spire_2.12/0.17.0-M1//spire_2.12-0.17.0-M1.jar
stax-api/1.0.1//stax-api-1.0.1.jar
-stax2-api/3.1.4//stax2-api-3.1.4.jar
stream/2.9.6//stream-2.9.6.jar
super-csv/2.2.0//super-csv-2.2.0.jar
threeten-extra/1.5.0//threeten-extra-1.5.0.jar
-token-provider/1.0.1//token-provider-1.0.1.jar
transaction-api/1.1//transaction-api-1.1.jar
univocity-parsers/2.9.0//univocity-parsers-2.9.0.jar
velocity/1.5//velocity-1.5.jar
-woodstox-core/5.0.3//woodstox-core-5.0.3.jar
xbean-asm7-shaded/4.15//xbean-asm7-shaded-4.15.jar
xz/1.5//xz-1.5.jar
zjsonpatch/0.3.0//zjsonpatch-0.3.0.jar
diff --git a/external/kafka-0-10-assembly/pom.xml b/external/kafka-0-10-assembly/pom.xml
index 2359e99..121bc56 100644
--- a/external/kafka-0-10-assembly/pom.xml
+++ b/external/kafka-0-10-assembly/pom.xml
@@ -71,10 +71,16 @@
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-client</artifactId>
+ <artifactId>${hadoop-client-api.artifact}</artifactId>
+ <version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>${hadoop-client-runtime.artifact}</artifactId>
+ <version>${hadoop.version}</version>
+ </dependency>
+ <dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-mapred</artifactId>
<classifier>${avro.mapred.classifier}</classifier>
diff --git a/external/kafka-0-10-sql/pom.xml b/external/kafka-0-10-sql/pom.xml
index 843f160..1833b35 100644
--- a/external/kafka-0-10-sql/pom.xml
+++ b/external/kafka-0-10-sql/pom.xml
@@ -80,6 +80,10 @@
<version>${kafka.version}</version>
</dependency>
<dependency>
+ <groupId>com.google.code.findbugs</groupId>
+ <artifactId>jsr305</artifactId>
+ </dependency>
+ <dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-pool2</artifactId>
<version>${commons-pool2.version}</version>
diff --git a/external/kafka-0-10-token-provider/pom.xml b/external/kafka-0-10-token-provider/pom.xml
index dbe2ab9..4ee09fa 100644
--- a/external/kafka-0-10-token-provider/pom.xml
+++ b/external/kafka-0-10-token-provider/pom.xml
@@ -59,6 +59,11 @@
<scope>test</scope>
</dependency>
<dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>${hadoop-client-runtime.artifact}</artifactId>
+ <scope>${hadoop.deps.scope}</scope>
+ </dependency>
+ <dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-tags_${scala.binary.version}</artifactId>
</dependency>
diff --git a/external/kinesis-asl-assembly/pom.xml b/external/kinesis-asl-assembly/pom.xml
index 22259b0..9a98d7c 100644
--- a/external/kinesis-asl-assembly/pom.xml
+++ b/external/kinesis-asl-assembly/pom.xml
@@ -91,10 +91,16 @@
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-client</artifactId>
+ <artifactId>${hadoop-client-api.artifact}</artifactId>
+ <version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>${hadoop-client-runtime.artifact}</artifactId>
+ <version>${hadoop.version}</version>
+ </dependency>
+ <dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-ipc</artifactId>
<scope>provided</scope>
diff --git a/hadoop-cloud/pom.xml b/hadoop-cloud/pom.xml
index 03910ba..c0997e5 100644
--- a/hadoop-cloud/pom.xml
+++ b/hadoop-cloud/pom.xml
@@ -58,10 +58,15 @@
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-client</artifactId>
+ <artifactId>${hadoop-client-api.artifact}</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
+ <dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>${hadoop-client-runtime.artifact}</artifactId>
+ <version>${hadoop.version}</version>
+ </dependency>
<!--
the AWS module pulls in jackson; its transitive dependencies can create
intra-jackson-module version problems.
diff --git a/launcher/pom.xml b/launcher/pom.xml
index 5da2a49..dd872f4 100644
--- a/launcher/pom.xml
+++ b/launcher/pom.xml
@@ -81,7 +81,14 @@
<!-- Not needed by the test code, but referenced by SparkSubmit which is used by the tests. -->
<dependency>
<groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-client</artifactId>
+ <artifactId>${hadoop-client-api.artifact}</artifactId>
+ <version>${hadoop.version}</version>
+ <scope>test</scope>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>${hadoop-client-runtime.artifact}</artifactId>
+ <version>${hadoop.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
diff --git a/pom.xml b/pom.xml
index f921e35..26b5186 100644
--- a/pom.xml
+++ b/pom.xml
@@ -120,7 +120,7 @@
<sbt.project.name>spark</sbt.project.name>
<slf4j.version>1.7.30</slf4j.version>
<log4j.version>1.2.17</log4j.version>
- <hadoop.version>3.2.0</hadoop.version>
+ <hadoop.version>3.2.2</hadoop.version>
<protobuf.version>2.5.0</protobuf.version>
<yarn.version>${hadoop.version}</yarn.version>
<zookeeper.version>3.4.14</zookeeper.version>
@@ -246,6 +246,15 @@
<parquet.test.deps.scope>test</parquet.test.deps.scope>
<!--
+ These default to Hadoop 3.x shaded client/minicluster jars, but are switched to hadoop-client
+ when the Hadoop profile is hadoop-2.7, because these are only available in 3.x. Note that,
+ as result we have to include the same hadoop-client dependency multiple times in hadoop-2.7.
+ -->
+ <hadoop-client-api.artifact>hadoop-client-api</hadoop-client-api.artifact>
+ <hadoop-client-runtime.artifact>hadoop-client-runtime</hadoop-client-runtime.artifact>
+ <hadoop-client-minicluster.artifact>hadoop-client-minicluster</hadoop-client-minicluster.artifact>
+
+ <!--
Overridable test home. So that you can call individual pom files directly without
things breaking.
-->
@@ -860,6 +869,11 @@
<version>2.0.1</version>
</dependency>
<dependency>
+ <groupId>javax.xml.bind</groupId>
+ <artifactId>jaxb-api</artifactId>
+ <version>2.2.11</version>
+ </dependency>
+ <dependency>
<groupId>org.scalanlp</groupId>
<artifactId>breeze_${scala.binary.version}</artifactId>
<version>1.0</version>
@@ -1067,6 +1081,26 @@
<version>${curator.version}</version>
<scope>test</scope>
</dependency>
+ <!-- Hadoop 3.x dependencies -->
+ <dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>hadoop-client-api</artifactId>
+ <version>${hadoop.version}</version>
+ <scope>${hadoop.deps.scope}</scope>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>hadoop-client-runtime</artifactId>
+ <version>${hadoop.version}</version>
+ <scope>${hadoop.deps.scope}</scope>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>hadoop-client-minicluster</artifactId>
+ <version>${yarn.version}</version>
+ <scope>test</scope>
+ </dependency>
+ <!-- End of Hadoop 3.x dependencies -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
@@ -1657,6 +1691,14 @@
<artifactId>ant</artifactId>
</exclusion>
<exclusion>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>hadoop-common</artifactId>
+ </exclusion>
+ <exclusion>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>hadoop-auth</artifactId>
+ </exclusion>
+ <exclusion>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
</exclusion>
@@ -2420,17 +2462,6 @@
</rules>
</configuration>
</execution>
- <execution>
- <id>enforce-no-duplicate-dependencies</id>
- <goals>
- <goal>enforce</goal>
- </goals>
- <configuration>
- <rules>
- <banDuplicatePomDependencyVersions/>
- </rules>
- </configuration>
- </execution>
</executions>
</plugin>
<plugin>
@@ -2901,6 +2932,7 @@
<artifactId>maven-shade-plugin</artifactId>
<configuration>
<shadedArtifactAttached>false</shadedArtifactAttached>
+ <createDependencyReducedPom>false</createDependencyReducedPom>
<artifactSet>
<includes>
<include>org.spark-project.spark:unused</include>
@@ -3162,6 +3194,9 @@
<hadoop.version>2.7.4</hadoop.version>
<curator.version>2.7.1</curator.version>
<commons-io.version>2.4</commons-io.version>
+ <hadoop-client-api.artifact>hadoop-client</hadoop-client-api.artifact>
+ <hadoop-client-runtime.artifact>hadoop-client</hadoop-client-runtime.artifact>
+ <hadoop-client-minicluster.artifact>hadoop-client</hadoop-client-minicluster.artifact>
</properties>
</profile>
diff --git a/resource-managers/kubernetes/core/pom.xml b/resource-managers/kubernetes/core/pom.xml
index 44df4e1..3fff940 100644
--- a/resource-managers/kubernetes/core/pom.xml
+++ b/resource-managers/kubernetes/core/pom.xml
@@ -64,10 +64,6 @@
<artifactId>*</artifactId>
</exclusion>
<exclusion>
- <groupId>com.fasterxml.jackson.module</groupId>
- <artifactId>jackson-module-jaxb-annotations</artifactId>
- </exclusion>
- <exclusion>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-yaml</artifactId>
</exclusion>
@@ -85,11 +81,6 @@
<artifactId>jackson-dataformat-yaml</artifactId>
<version>${fasterxml.jackson.version}</version>
</dependency>
- <dependency>
- <groupId>com.fasterxml.jackson.module</groupId>
- <artifactId>jackson-module-jaxb-annotations</artifactId>
- <version>${fasterxml.jackson.version}</version>
- </dependency>
<!-- Explicitly depend on shaded dependencies from the parent, since shaded deps aren't transitive -->
<dependency>
diff --git a/resource-managers/yarn/pom.xml b/resource-managers/yarn/pom.xml
index c0ce1c8..a662953 100644
--- a/resource-managers/yarn/pom.xml
+++ b/resource-managers/yarn/pom.xml
@@ -40,6 +40,42 @@
<spark.yarn.isHadoopProvided>true</spark.yarn.isHadoopProvided>
</properties>
</profile>
+ <profile>
+ <id>hadoop-2.7</id>
+ <dependencies>
+ <dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>hadoop-yarn-api</artifactId>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>hadoop-yarn-common</artifactId>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>hadoop-yarn-server-web-proxy</artifactId>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>hadoop-yarn-client</artifactId>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>hadoop-yarn-server-tests</artifactId>
+ <classifier>tests</classifier>
+ <scope>test</scope>
+ </dependency>
+ <!--
+ Hack to exclude org.apache.hadoop:hadoop-yarn-server-resourcemanager:jar:tests.
+ See the parent pom.xml for more details.
+ -->
+ <dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+ <scope>test</scope>
+ </dependency>
+ </dependencies>
+ </profile>
</profiles>
<dependencies>
@@ -69,23 +105,20 @@
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-yarn-api</artifactId>
- </dependency>
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-yarn-common</artifactId>
+ <artifactId>${hadoop-client-api.artifact}</artifactId>
+ <version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-yarn-server-web-proxy</artifactId>
+ <artifactId>${hadoop-client-runtime.artifact}</artifactId>
+ <version>${hadoop.version}</version>
+ <scope>${hadoop.deps.scope}</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-yarn-client</artifactId>
- </dependency>
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-client</artifactId>
+ <artifactId>${hadoop-client-minicluster.artifact}</artifactId>
+ <version>${hadoop.version}</version>
+ <scope>test</scope>
</dependency>
<!-- Explicit listing of transitive deps that are shaded. Otherwise, odd compiler crashes. -->
@@ -136,18 +169,6 @@
</dependency>
<dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-yarn-server-tests</artifactId>
- <classifier>tests</classifier>
- <scope>test</scope>
- </dependency>
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
- <scope>test</scope>
- </dependency>
-
- <dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-core</artifactId>
<scope>test</scope>
diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
index ab69507..eb927a3 100644
--- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
+++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
@@ -19,7 +19,7 @@ package org.apache.spark.deploy.yarn
import java.io.{File, IOException}
import java.lang.reflect.{InvocationTargetException, Modifier}
-import java.net.{URI, URL}
+import java.net.{URI, URL, URLEncoder}
import java.security.PrivilegedExceptionAction
import java.util.concurrent.{TimeoutException, TimeUnit}
@@ -36,7 +36,6 @@ import org.apache.hadoop.yarn.api._
import org.apache.hadoop.yarn.api.records._
import org.apache.hadoop.yarn.conf.YarnConfiguration
import org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException
-import org.apache.hadoop.yarn.server.webproxy.ProxyUriUtils
import org.apache.hadoop.yarn.util.{ConverterUtils, Records}
import org.apache.spark._
@@ -308,7 +307,8 @@ private[spark] class ApplicationMaster(
// The client-mode AM doesn't listen for incoming connections, so report an invalid port.
registerAM(Utils.localHostName, -1, sparkConf,
sparkConf.getOption("spark.driver.appUIAddress"), appAttemptId)
- addAmIpFilter(Some(driverRef), ProxyUriUtils.getPath(appAttemptId.getApplicationId))
+ val encodedAppId = URLEncoder.encode(appAttemptId.getApplicationId.toString, "UTF-8")
+ addAmIpFilter(Some(driverRef), s"/proxy/$encodedAppId")
createAllocator(driverRef, sparkConf, clientRpcEnv, appAttemptId, cachedResourcesConf)
reporterThread.join()
} catch {
diff --git a/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala b/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala
index 20f5339..a813b99 100644
--- a/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala
+++ b/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala
@@ -80,6 +80,16 @@ abstract class BaseYarnClusterSuite
yarnConf.set("yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage",
"100.0")
+ // capacity-scheduler.xml is missing in hadoop-client-minicluster so this is a workaround
+ yarnConf.set("yarn.scheduler.capacity.root.queues", "default")
+ yarnConf.setInt("yarn.scheduler.capacity.root.default.capacity", 100)
+ yarnConf.setFloat("yarn.scheduler.capacity.root.default.user-limit-factor", 1)
+ yarnConf.setInt("yarn.scheduler.capacity.root.default.maximum-capacity", 100)
+ yarnConf.set("yarn.scheduler.capacity.root.default.state", "RUNNING")
+ yarnConf.set("yarn.scheduler.capacity.root.default.acl_submit_applications", "*")
+ yarnConf.set("yarn.scheduler.capacity.root.default.acl_administer_queue", "*")
+ yarnConf.setInt("yarn.scheduler.capacity.node-locality-delay", -1)
+
yarnCluster = new MiniYARNCluster(getClass().getName(), 1, 1, 1)
yarnCluster.init(yarnConf)
yarnCluster.start()
diff --git a/sql/catalyst/pom.xml b/sql/catalyst/pom.xml
index 0553438..583738b 100644
--- a/sql/catalyst/pom.xml
+++ b/sql/catalyst/pom.xml
@@ -105,6 +105,10 @@
<artifactId>antlr4-runtime</artifactId>
</dependency>
<dependency>
+ <groupId>javax.xml.bind</groupId>
+ <artifactId>jaxb-api</artifactId>
+ </dependency>
+ <dependency>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
</dependency>
diff --git a/sql/hive/pom.xml b/sql/hive/pom.xml
index 27d2756..74b1f9d 100644
--- a/sql/hive/pom.xml
+++ b/sql/hive/pom.xml
@@ -163,6 +163,11 @@
<artifactId>datanucleus-core</artifactId>
</dependency>
<dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>${hadoop-client-runtime.artifact}</artifactId>
+ <scope>${hadoop.deps.scope}</scope>
+ </dependency>
+ <dependency>
<groupId>org.apache.thrift</groupId>
<artifactId>libthrift</artifactId>
</dependency>
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
index 02bf865..4e5e58d 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
@@ -112,11 +112,24 @@ private[hive] object IsolatedClientLoader extends Logging {
hadoopVersion: String,
ivyPath: Option[String],
remoteRepos: String): Seq[URL] = {
+ val hadoopJarNames = if (hadoopVersion.startsWith("3")) {
+ Seq(s"org.apache.hadoop:hadoop-client-api:$hadoopVersion",
+ s"org.apache.hadoop:hadoop-client-runtime:$hadoopVersion")
+ } else {
+ Seq(s"org.apache.hadoop:hadoop-client:$hadoopVersion")
+ }
val hiveArtifacts = version.extraDeps ++
Seq("hive-metastore", "hive-exec", "hive-common", "hive-serde")
.map(a => s"org.apache.hive:$a:${version.fullVersion}") ++
- Seq("com.google.guava:guava:14.0.1",
- s"org.apache.hadoop:hadoop-client:$hadoopVersion")
+ Seq("com.google.guava:guava:14.0.1") ++ hadoopJarNames
+
+ val extraExclusions = if (hadoopVersion.startsWith("3")) {
+ // this introduced from lower version of Hive could conflict with jars in Hadoop 3.2+, so
+ // exclude here in favor of the ones in Hadoop 3.2+
+ Seq("org.apache.hadoop:hadoop-auth")
+ } else {
+ Seq.empty
+ }
val classpaths = quietly {
SparkSubmitUtils.resolveMavenCoordinates(
@@ -125,7 +138,7 @@ private[hive] object IsolatedClientLoader extends Logging {
Some(remoteRepos),
ivyPath),
transitive = true,
- exclusions = version.exclusions)
+ exclusions = version.exclusions ++ extraExclusions)
}
val allFiles = classpaths.map(new File(_)).toSet
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org