You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by ma...@apache.org on 2023/01/10 00:03:06 UTC
[flink] 01/03: [FLINK-29710][Filesystem] Bump minimum supported Hadoop version to 2.10.2
This is an automated email from the ASF dual-hosted git repository.
martijnvisser pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git
commit 573ed922346c791760d27653543c2b8df56f51f7
Author: Martijn Visser <ma...@apache.org>
AuthorDate: Thu Oct 20 22:12:55 2022 +0200
[FLINK-29710][Filesystem] Bump minimum supported Hadoop version to 2.10.2
- Make the SQL Client use the Hadoop versioning as defined in the parent POM
- Make the SQL Gateway use the Hive and Hadoop versioning as defined in the parent POM
- Move and clarify Hive specific Hadoop versioning in `flink-connector-hive`
- Sync stax2-api to solve dependency convergence in hadoop-common
- Sync commons-beanutils and stax2-api to solve dependency convergence in hadoop-common
- Sync commons-beanutils and stax2-api to solve dependency convergence in hadoop-common
- Fix YarnTestBase to work with new Hadoop version. Also remove comment on issue with previously supported Hadoop version
- Disable HDFS Client to remove a failed datanode and to never add a new datanode when an existing one is removed. Its recommended to disable this for small clusters, which is the case in our tests
- Bump Hadoop 3 to 3.2.3.
HADOOP-12984 changed where the MiniYARNCluster wrote to by default.
2.10.2: https://github.com/apache/hadoop/blob/965fd380006fa78b2315668fbc7eb432e1d8200f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java#L177
This hasn't made it's way into 3.1.3: https://github.com/apache/hadoop/blob/aa96f1871bfd858f9bac59cf2a81ec470da649af/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java#L175
But was added to 3.2.3 in MAPREDUCE-7320.
Co-authored-by: Gabor Somogyi <ga...@apple.com>
Co-authored-by: Chesnay Schepler <ch...@apache.org>
---
azure-pipelines.yml | 4 +-
.../docs/connectors/dataset/formats/hadoop.md | 2 +-
.../docs/connectors/datastream/formats/hadoop.md | 2 +-
docs/content.zh/docs/deployment/filesystems/gcs.md | 2 +-
.../docs/deployment/resource-providers/yarn.md | 4 +-
.../docs/dev/dataset/hadoop_compatibility.md | 2 +-
.../docs/connectors/dataset/formats/hadoop.md | 2 +-
.../docs/connectors/datastream/formats/hadoop.md | 2 +-
.../docs/deployment/resource-providers/yarn.md | 4 +-
docs/content/docs/dev/dataset/hadoop_map_reduce.md | 2 +-
flink-connectors/flink-connector-hive/pom.xml | 73 +++++++----------
.../hive/FlinkEmbeddedHiveServerContext.java | 4 +-
.../flink-end-to-end-tests-sql/pom.xml | 12 +++
.../flink-sql-gateway-test/pom.xml | 6 --
.../test-scripts/common_yarn_docker.sh | 2 +-
.../docker-hadoop-secure-cluster/README.md | 4 +-
.../apache/flink/runtime/util/HadoopUtilsTest.java | 4 +-
.../src/test/resources/hdfs-site.xml | 37 +++++++++
flink-table/flink-sql-client/pom.xml | 21 ++++-
flink-yarn-tests/README.md | 5 --
.../java/org/apache/flink/yarn/YarnTestBase.java | 29 +++++--
flink-yarn/pom.xml | 94 ++++++++++++++++++++++
.../yarn/configuration/YarnConfigOptions.java | 2 +-
pom.xml | 7 +-
tools/azure-pipelines/build-apache-repo.yml | 12 +--
25 files changed, 242 insertions(+), 96 deletions(-)
diff --git a/azure-pipelines.yml b/azure-pipelines.yml
index a62e5f8d07e..38f755ef222 100644
--- a/azure-pipelines.yml
+++ b/azure-pipelines.yml
@@ -77,7 +77,7 @@ stages:
vmImage: 'ubuntu-20.04'
e2e_pool_definition:
vmImage: 'ubuntu-20.04'
- environment: PROFILE="-Dflink.hadoop.version=2.8.5 -Dscala-2.12"
+ environment: PROFILE="-Dflink.hadoop.version=2.10.2 -Dscala-2.12"
run_end_to_end: false
container: flink-build-container
jdk: 8
@@ -97,5 +97,5 @@ stages:
- template: tools/azure-pipelines/build-python-wheels.yml
parameters:
stage_name: cron_python_wheels
- environment: PROFILE="-Dflink.hadoop.version=2.8.5 -Dscala-2.12"
+ environment: PROFILE="-Dflink.hadoop.version=2.10.2 -Dscala-2.12"
container: flink-build-container
diff --git a/docs/content.zh/docs/connectors/dataset/formats/hadoop.md b/docs/content.zh/docs/connectors/dataset/formats/hadoop.md
index 4a1160d562c..e4143f34596 100644
--- a/docs/content.zh/docs/connectors/dataset/formats/hadoop.md
+++ b/docs/content.zh/docs/connectors/dataset/formats/hadoop.md
@@ -49,7 +49,7 @@ a `hadoop-client` dependency such as:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
- <version>2.8.5</version>
+ <version>2.10.2</version>
<scope>provided</scope>
</dependency>
```
diff --git a/docs/content.zh/docs/connectors/datastream/formats/hadoop.md b/docs/content.zh/docs/connectors/datastream/formats/hadoop.md
index b5e63b0224e..20f0d767efc 100644
--- a/docs/content.zh/docs/connectors/datastream/formats/hadoop.md
+++ b/docs/content.zh/docs/connectors/datastream/formats/hadoop.md
@@ -48,7 +48,7 @@ under the License.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
- <version>2.8.5</version>
+ <version>2.10.2</version>
<scope>provided</scope>
</dependency>
```
diff --git a/docs/content.zh/docs/deployment/filesystems/gcs.md b/docs/content.zh/docs/deployment/filesystems/gcs.md
index 2c999509aaa..01ae9532941 100644
--- a/docs/content.zh/docs/deployment/filesystems/gcs.md
+++ b/docs/content.zh/docs/deployment/filesystems/gcs.md
@@ -68,7 +68,7 @@ You must include the following jars in Flink's `lib` directory to connect Flink
</dependency>
```
-We have tested with `flink-shared-hadoop2-uber` version >= `2.8.5-1.8.3`.
+We have tested with `flink-shared-hadoop2-uber` version >= `2.10.2-1.8.3`.
You can track the latest version of the [gcs-connector hadoop 2](https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar).
### Authentication to access GCS
diff --git a/docs/content.zh/docs/deployment/resource-providers/yarn.md b/docs/content.zh/docs/deployment/resource-providers/yarn.md
index 7ac23f5e819..8354af6008a 100644
--- a/docs/content.zh/docs/deployment/resource-providers/yarn.md
+++ b/docs/content.zh/docs/deployment/resource-providers/yarn.md
@@ -40,7 +40,7 @@ Flink can dynamically allocate and de-allocate TaskManager resources depending o
### Preparation
-This *Getting Started* section assumes a functional YARN environment, starting from version 2.8.5. YARN environments are provided most conveniently through services such as Amazon EMR, Google Cloud DataProc or products like Cloudera. [Manually setting up a YARN environment locally](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html) or [on a cluster](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html) is not [...]
+This *Getting Started* section assumes a functional YARN environment, starting from version 2.10.2. YARN environments are provided most conveniently through services such as Amazon EMR, Google Cloud DataProc or products like Cloudera. [Manually setting up a YARN environment locally](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html) or [on a cluster](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html) is no [...]
- Make sure your YARN cluster is ready for accepting Flink applications by running `yarn top`. It should show no error messages.
- Download a recent Flink distribution from the [download page]({{< downloads >}}) and unpack it.
@@ -219,7 +219,7 @@ Hadoop YARN 2.4.0 has a major bug (fixed in 2.5.0) preventing container restarts
### Supported Hadoop versions.
-Flink on YARN is compiled against Hadoop 2.8.5, and all Hadoop versions `>= 2.8.5` are supported, including Hadoop 3.x.
+Flink on YARN is compiled against Hadoop 2.10.2, and all Hadoop versions `>= 2.10.2` are supported, including Hadoop 3.x.
For providing Flink with the required Hadoop dependencies, we recommend setting the `HADOOP_CLASSPATH` environment variable already introduced in the [Getting Started / Preparation](#preparation) section.
diff --git a/docs/content.zh/docs/dev/dataset/hadoop_compatibility.md b/docs/content.zh/docs/dev/dataset/hadoop_compatibility.md
index bb2f781dc70..184785ce36d 100644
--- a/docs/content.zh/docs/dev/dataset/hadoop_compatibility.md
+++ b/docs/content.zh/docs/dev/dataset/hadoop_compatibility.md
@@ -88,7 +88,7 @@ a `hadoop-client` dependency such as:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
- <version>2.8.5</version>
+ <version>2.10.2</version>
<scope>provided</scope>
</dependency>
```
diff --git a/docs/content/docs/connectors/dataset/formats/hadoop.md b/docs/content/docs/connectors/dataset/formats/hadoop.md
index 4a1160d562c..e4143f34596 100644
--- a/docs/content/docs/connectors/dataset/formats/hadoop.md
+++ b/docs/content/docs/connectors/dataset/formats/hadoop.md
@@ -49,7 +49,7 @@ a `hadoop-client` dependency such as:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
- <version>2.8.5</version>
+ <version>2.10.2</version>
<scope>provided</scope>
</dependency>
```
diff --git a/docs/content/docs/connectors/datastream/formats/hadoop.md b/docs/content/docs/connectors/datastream/formats/hadoop.md
index d8b682402c8..edb95edde0d 100644
--- a/docs/content/docs/connectors/datastream/formats/hadoop.md
+++ b/docs/content/docs/connectors/datastream/formats/hadoop.md
@@ -50,7 +50,7 @@ a `hadoop-client` dependency such as:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
- <version>2.8.5</version>
+ <version>2.10.2</version>
<scope>provided</scope>
</dependency>
```
diff --git a/docs/content/docs/deployment/resource-providers/yarn.md b/docs/content/docs/deployment/resource-providers/yarn.md
index 2eb1d2214aa..49d6a3501aa 100644
--- a/docs/content/docs/deployment/resource-providers/yarn.md
+++ b/docs/content/docs/deployment/resource-providers/yarn.md
@@ -40,7 +40,7 @@ Flink can dynamically allocate and de-allocate TaskManager resources depending o
### Preparation
-This *Getting Started* section assumes a functional YARN environment, starting from version 2.8.5. YARN environments are provided most conveniently through services such as Amazon EMR, Google Cloud DataProc or products like Cloudera. [Manually setting up a YARN environment locally](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html) or [on a cluster](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html) is not [...]
+This *Getting Started* section assumes a functional YARN environment, starting from version 2.10.2. YARN environments are provided most conveniently through services such as Amazon EMR, Google Cloud DataProc or products like Cloudera. [Manually setting up a YARN environment locally](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html) or [on a cluster](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html) is no [...]
- Make sure your YARN cluster is ready for accepting Flink applications by running `yarn top`. It should show no error messages.
- Download a recent Flink distribution from the [download page]({{< downloads >}}) and unpack it.
@@ -235,7 +235,7 @@ Hadoop YARN 2.4.0 has a major bug (fixed in 2.5.0) preventing container restarts
### Supported Hadoop versions.
-Flink on YARN is compiled against Hadoop 2.8.5, and all Hadoop versions `>= 2.8.5` are supported, including Hadoop 3.x.
+Flink on YARN is compiled against Hadoop 2.10.2, and all Hadoop versions `>= 2.10.2` are supported, including Hadoop 3.x.
For providing Flink with the required Hadoop dependencies, we recommend setting the `HADOOP_CLASSPATH` environment variable already introduced in the [Getting Started / Preparation](#preparation) section.
diff --git a/docs/content/docs/dev/dataset/hadoop_map_reduce.md b/docs/content/docs/dev/dataset/hadoop_map_reduce.md
index 8e5c46865f8..46d63f80521 100644
--- a/docs/content/docs/dev/dataset/hadoop_map_reduce.md
+++ b/docs/content/docs/dev/dataset/hadoop_map_reduce.md
@@ -76,7 +76,7 @@ a `hadoop-client` dependency such as:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
- <version>2.8.5</version>
+ <version>2.10.2</version>
<scope>provided</scope>
</dependency>
```
diff --git a/flink-connectors/flink-connector-hive/pom.xml b/flink-connectors/flink-connector-hive/pom.xml
index e82debd5023..9c9e34e7a35 100644
--- a/flink-connectors/flink-connector-hive/pom.xml
+++ b/flink-connectors/flink-connector-hive/pom.xml
@@ -39,15 +39,20 @@ under the License.
<reflections.version>0.9.8</reflections.version>
<derby.version>10.10.2.0</derby.version>
<hive.avro.version>1.8.2</hive.avro.version>
+ <!--
+ Hive requires Hadoop 2 to avoid
+ java.lang.NoClassDefFoundError: org/apache/hadoop/metrics/Updater errors
+ Using this dedicated property avoids CI failures with the Hadoop 3 profile
+ -->
+ <hive.hadoop.version>2.10.2</hive.hadoop.version>
</properties>
- <!-- Overwrite hadoop dependency management from flink-parent to use locally defined Hadoop version -->
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
- <version>${hivemetastore.hadoop.version}</version>
+ <version>${hive.hadoop.version}</version>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
@@ -79,7 +84,7 @@ under the License.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
- <version>${hivemetastore.hadoop.version}</version>
+ <version>${hive.hadoop.version}</version>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
@@ -95,7 +100,7 @@ under the License.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-common</artifactId>
- <version>${hivemetastore.hadoop.version}</version>
+ <version>${hive.hadoop.version}</version>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
@@ -111,7 +116,7 @@ under the License.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-client</artifactId>
- <version>${hivemetastore.hadoop.version}</version>
+ <version>${hive.hadoop.version}</version>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
@@ -254,6 +259,24 @@ under the License.
<scope>provided</scope>
</dependency>
+ <dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>hadoop-hdfs</artifactId>
+ <version>${hive.hadoop.version}</version>
+ <type>test-jar</type>
+ <scope>test</scope>
+ <exclusions>
+ <exclusion>
+ <groupId>log4j</groupId>
+ <artifactId>log4j</artifactId>
+ </exclusion>
+ <exclusion>
+ <groupId>ch.qos.reload4j</groupId>
+ <artifactId>reload4j</artifactId>
+ </exclusion>
+ </exclusions>
+ </dependency>
+
<!-- Hive -->
<!-- Note: Hive published jars do not have proper dependencies declared.
@@ -910,13 +933,6 @@ under the License.
<scope>test</scope>
</dependency>
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-hdfs</artifactId>
- <version>${hivemetastore.hadoop.version}</version>
- <scope>test</scope>
- </dependency>
-
<!-- ArchUit test dependencies -->
<dependency>
@@ -1115,33 +1131,8 @@ under the License.
<properties>
<hive.version>3.1.3</hive.version>
<derby.version>10.14.1.0</derby.version>
- <!-- need a hadoop version that fixes HADOOP-14683 -->
- <hivemetastore.hadoop.version>2.8.2</hivemetastore.hadoop.version>
</properties>
- <dependencyManagement>
- <dependencies>
- <dependency>
- <groupId>org.apache.hive</groupId>
- <artifactId>hive-metastore</artifactId>
- <version>${hive.version}</version>
- <scope>provided</scope>
- <exclusions>
- <exclusion>
- <!-- Override arrow netty dependency -->
- <groupId>io.netty</groupId>
- <artifactId>netty-buffer</artifactId>
- </exclusion>
- <exclusion>
- <!-- Override arrow netty dependency -->
- <groupId>io.netty</groupId>
- <artifactId>netty-common</artifactId>
- </exclusion>
- </exclusions>
- </dependency>
- </dependencies>
- </dependencyManagement>
-
<dependencies>
<dependency>
<!-- Bump arrow netty dependency -->
@@ -1158,14 +1149,6 @@ under the License.
<version>4.1.46.Final</version>
<scope>provided</scope>
</dependency>
-
- <!-- Required by orc tests -->
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-hdfs</artifactId>
- <version>${hivemetastore.hadoop.version}</version>
- <scope>test</scope>
- </dependency>
</dependencies>
</profile>
diff --git a/flink-connectors/flink-connector-hive/src/test/java/org/apache/flink/connectors/hive/FlinkEmbeddedHiveServerContext.java b/flink-connectors/flink-connector-hive/src/test/java/org/apache/flink/connectors/hive/FlinkEmbeddedHiveServerContext.java
index 670eb56b7ff..212bbcf0e41 100644
--- a/flink-connectors/flink-connector-hive/src/test/java/org/apache/flink/connectors/hive/FlinkEmbeddedHiveServerContext.java
+++ b/flink-connectors/flink-connector-hive/src/test/java/org/apache/flink/connectors/hive/FlinkEmbeddedHiveServerContext.java
@@ -150,8 +150,8 @@ public class FlinkEmbeddedHiveServerContext implements HiveServerContext {
private void configureJavaSecurityRealm() {
// These three properties gets rid of: 'Unable to load realm info from SCDynamicStore'
// which seems to have a timeout of about 5 secs.
- System.setProperty("java.security.krb5.realm", "");
- System.setProperty("java.security.krb5.kdc", "");
+ System.setProperty("java.security.krb5.realm", "EXAMPLE.COM");
+ System.setProperty("java.security.krb5.kdc", "kdc");
System.setProperty("java.security.krb5.conf", "/dev/null");
}
diff --git a/flink-end-to-end-tests/flink-end-to-end-tests-sql/pom.xml b/flink-end-to-end-tests/flink-end-to-end-tests-sql/pom.xml
index b539d5ecf2b..ed808e952f2 100644
--- a/flink-end-to-end-tests/flink-end-to-end-tests-sql/pom.xml
+++ b/flink-end-to-end-tests/flink-end-to-end-tests-sql/pom.xml
@@ -158,6 +158,18 @@
</dependency>
</dependencies>
+ <dependencyManagement>
+ <dependencies>
+ <dependency>
+ <!-- dependency convergence -->
+ <groupId>org.codehaus.woodstox</groupId>
+ <artifactId>stax2-api</artifactId>
+ <version>4.2.1</version>
+ <scope>test</scope>
+ </dependency>
+ </dependencies>
+ </dependencyManagement>
+
<build>
<plugins>
<plugin>
diff --git a/flink-end-to-end-tests/flink-sql-gateway-test/pom.xml b/flink-end-to-end-tests/flink-sql-gateway-test/pom.xml
index 22598d286a3..b246023ce7e 100644
--- a/flink-end-to-end-tests/flink-sql-gateway-test/pom.xml
+++ b/flink-end-to-end-tests/flink-sql-gateway-test/pom.xml
@@ -30,12 +30,6 @@ under the License.
<name>Flink : E2E Tests : SQL Gateway</name>
<packaging>jar</packaging>
- <properties>
- <!-- The test container uses hive-2.1.0 -->
- <hive.version>2.3.9</hive.version>
- <flink.hadoop.version>2.8.5</flink.hadoop.version>
- </properties>
-
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
diff --git a/flink-end-to-end-tests/test-scripts/common_yarn_docker.sh b/flink-end-to-end-tests/test-scripts/common_yarn_docker.sh
index f2c67628435..97fcca09d5a 100755
--- a/flink-end-to-end-tests/test-scripts/common_yarn_docker.sh
+++ b/flink-end-to-end-tests/test-scripts/common_yarn_docker.sh
@@ -99,7 +99,7 @@ function start_hadoop_cluster() {
function build_image() {
echo "Pre-downloading Hadoop tarball"
local cache_path
- cache_path=$(get_artifact "http://archive.apache.org/dist/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gz")
+ cache_path=$(get_artifact "http://archive.apache.org/dist/hadoop/common/hadoop-2.10.2/hadoop-2.10.2.tar.gz")
ln "${cache_path}" "${END_TO_END_DIR}/test-scripts/docker-hadoop-secure-cluster/hadoop/hadoop.tar.gz"
echo "Building Hadoop Docker container"
diff --git a/flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/README.md b/flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/README.md
index 7055c317795..d94fc534b06 100644
--- a/flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/README.md
+++ b/flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/README.md
@@ -4,7 +4,7 @@ Required versions
-----------------
* JDK8
-* Hadoop 2.8.5
+* Hadoop 2.10.2
Default Environment Variables
-----------------------------
@@ -24,7 +24,7 @@ Run image
```
cd flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster
-wget -O hadoop/hadoop.tar.gz https://archive.apache.org/dist/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gz
+wget -O hadoop/hadoop.tar.gz https://archive.apache.org/dist/hadoop/common/hadoop-2.10.2/hadoop-2.10.2.tar.gz
docker-compose build
docker-compose up
```
diff --git a/flink-filesystems/flink-hadoop-fs/src/test/java/org/apache/flink/runtime/util/HadoopUtilsTest.java b/flink-filesystems/flink-hadoop-fs/src/test/java/org/apache/flink/runtime/util/HadoopUtilsTest.java
index 1fcd0728b50..6af00291fda 100644
--- a/flink-filesystems/flink-hadoop-fs/src/test/java/org/apache/flink/runtime/util/HadoopUtilsTest.java
+++ b/flink-filesystems/flink-hadoop-fs/src/test/java/org/apache/flink/runtime/util/HadoopUtilsTest.java
@@ -42,8 +42,8 @@ public class HadoopUtilsTest extends TestLogger {
@BeforeClass
public static void setPropertiesToEnableKerberosConfigInit() throws KrbException {
- System.setProperty("java.security.krb5.realm", "");
- System.setProperty("java.security.krb5.kdc", "");
+ System.setProperty("java.security.krb5.realm", "EXAMPLE.COM");
+ System.setProperty("java.security.krb5.kdc", "kdc");
System.setProperty("java.security.krb5.conf", "/dev/null");
sun.security.krb5.Config.refresh();
}
diff --git a/flink-filesystems/flink-hadoop-fs/src/test/resources/hdfs-site.xml b/flink-filesystems/flink-hadoop-fs/src/test/resources/hdfs-site.xml
new file mode 100644
index 00000000000..7f906a6a76c
--- /dev/null
+++ b/flink-filesystems/flink-hadoop-fs/src/test/resources/hdfs-site.xml
@@ -0,0 +1,37 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+ http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<configuration>
+ <!-- dfs.client.block.write.replace-datanode-on-failure.enable and
+ dfs.client.block.write.replace-datanode-on-failure.policy are introduced as part of FLINK-29710
+ When the cluster size is extremely small, e.g. 3 nodes or less, cluster
+ administrators may want to set the policy to NEVER in the default
+ configuration file or disable this feature. Otherwise, users may
+ experience an unusually high rate of pipeline failures since it is
+ impossible to find new datanodes for replacement.
+ -->
+ <property>
+ <name>dfs.client.block.write.replace-datanode-on-failure.enable</name>
+ <value>never</value>
+ </property>
+
+ <property>
+ <name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
+ <value>never</value>
+ </property>
+</configuration>
+
diff --git a/flink-table/flink-sql-client/pom.xml b/flink-table/flink-sql-client/pom.xml
index f5277c08f76..81be83cbd1b 100644
--- a/flink-table/flink-sql-client/pom.xml
+++ b/flink-table/flink-sql-client/pom.xml
@@ -454,7 +454,6 @@ under the License.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
- <version>${hivemetastore.hadoop.version}</version>
<scope>test</scope>
<exclusions>
<exclusion>
@@ -495,7 +494,6 @@ under the License.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
- <version>${hivemetastore.hadoop.version}</version>
<scope>test</scope>
<exclusions>
<exclusion>
@@ -529,6 +527,25 @@ under the License.
</dependency>
</dependencies>
+ <dependencyManagement>
+ <dependencies>
+ <dependency>
+ <!-- dependency convergence -->
+ <groupId>commons-beanutils</groupId>
+ <artifactId>commons-beanutils</artifactId>
+ <version>1.9.4</version>
+ <scope>test</scope>
+ </dependency>
+ <dependency>
+ <!-- dependency convergence -->
+ <groupId>org.codehaus.woodstox</groupId>
+ <artifactId>stax2-api</artifactId>
+ <version>4.2.1</version>
+ <scope>test</scope>
+ </dependency>
+ </dependencies>
+ </dependencyManagement>
+
<build>
<plugins>
diff --git a/flink-yarn-tests/README.md b/flink-yarn-tests/README.md
index f31ff18fd17..67144785b48 100644
--- a/flink-yarn-tests/README.md
+++ b/flink-yarn-tests/README.md
@@ -10,8 +10,3 @@ There are several things to consider when running these tests locally:
* Each `YARN*ITCase` will have a local working directory for resources like logs to be stored. These
working directories are located in `flink-yarn-tests/target/` (see
`find flink-yarn-tests/target -name "*.err" -or -name "*.out"` for the test's output).
-* There is a known problem causing test instabilities due to our usage of Hadoop 2.8.5 executing the
- tests. This is caused by a bug [YARN-7007](https://issues.apache.org/jira/browse/YARN-7007) that
- got fixed in [Hadoop 2.8.6](https://issues.apache.org/jira/projects/YARN/versions/12344056). See
- [FLINK-15534](https://issues.apache.org/jira/browse/FLINK-15534) for further details on the
- related discussion.
diff --git a/flink-yarn-tests/src/test/java/org/apache/flink/yarn/YarnTestBase.java b/flink-yarn-tests/src/test/java/org/apache/flink/yarn/YarnTestBase.java
index 1cd27df554a..1f7fa1433af 100644
--- a/flink-yarn-tests/src/test/java/org/apache/flink/yarn/YarnTestBase.java
+++ b/flink-yarn-tests/src/test/java/org/apache/flink/yarn/YarnTestBase.java
@@ -42,6 +42,7 @@ import org.apache.hadoop.security.Credentials;
import org.apache.hadoop.security.token.Token;
import org.apache.hadoop.security.token.TokenIdentifier;
import org.apache.hadoop.service.Service;
+import org.apache.hadoop.test.GenericTestUtils;
import org.apache.hadoop.yarn.api.records.ApplicationId;
import org.apache.hadoop.yarn.api.records.ApplicationReport;
import org.apache.hadoop.yarn.api.records.ContainerId;
@@ -173,7 +174,10 @@ public abstract class YarnTestBase {
Pattern.compile("java\\.lang\\.InterruptedException"),
// this can happen if the hbase delegation token provider is not available
- Pattern.compile("ClassNotFoundException : \"org.apache.hadoop.hbase.HBaseConfiguration\"")
+ Pattern.compile("ClassNotFoundException : \"org.apache.hadoop.hbase.HBaseConfiguration\""),
+
+ // This happens in YARN shutdown
+ Pattern.compile("Rejected TaskExecutor registration at the ResourceManager")
};
// Temp directory which is deleted after the unit test.
@@ -480,7 +484,10 @@ public abstract class YarnTestBase {
*/
public static void ensureNoProhibitedStringInLogFiles(
final String[] prohibited, final Pattern[] whitelisted) {
- File cwd = new File("target/" + YARN_CONFIGURATION.get(TEST_CLUSTER_NAME_KEY));
+ File cwd =
+ new File(
+ GenericTestUtils.getTestDir(),
+ YARN_CONFIGURATION.get(TEST_CLUSTER_NAME_KEY));
assertThat(cwd).exists();
assertThat(cwd).isDirectory();
@@ -608,10 +615,15 @@ public abstract class YarnTestBase {
public static boolean verifyStringsInNamedLogFiles(
final String[] mustHave, final ApplicationId applicationId, final String fileName) {
final List<String> mustHaveList = Arrays.asList(mustHave);
- final File cwd = new File("target", YARN_CONFIGURATION.get(TEST_CLUSTER_NAME_KEY));
+ final File cwd =
+ new File(
+ GenericTestUtils.getTestDir(),
+ YARN_CONFIGURATION.get(TEST_CLUSTER_NAME_KEY));
if (!cwd.exists() || !cwd.isDirectory()) {
+ LOG.debug("Directory doesn't exist: {}", cwd.getAbsolutePath());
return false;
}
+ LOG.debug("Directory exist: {}", cwd.getAbsolutePath());
final File foundFile =
TestUtils.findFile(
@@ -666,8 +678,12 @@ public abstract class YarnTestBase {
public static boolean verifyTokenKindInContainerCredentials(
final Collection<String> tokens, final String containerId) throws IOException {
- File cwd = new File("target/" + YARN_CONFIGURATION.get(TEST_CLUSTER_NAME_KEY));
+ File cwd =
+ new File(
+ GenericTestUtils.getTestDir(),
+ YARN_CONFIGURATION.get(TEST_CLUSTER_NAME_KEY));
if (!cwd.exists() || !cwd.isDirectory()) {
+ LOG.info("Directory doesn't exist: {}", cwd.getAbsolutePath());
return false;
}
@@ -701,7 +717,10 @@ public abstract class YarnTestBase {
}
public static String getContainerIdByLogName(String logName) {
- File cwd = new File("target/" + YARN_CONFIGURATION.get(TEST_CLUSTER_NAME_KEY));
+ File cwd =
+ new File(
+ GenericTestUtils.getTestDir(),
+ YARN_CONFIGURATION.get(TEST_CLUSTER_NAME_KEY));
File containerLog =
TestUtils.findFile(
cwd.getAbsolutePath(),
diff --git a/flink-yarn/pom.xml b/flink-yarn/pom.xml
index 1583a851ece..7b30ade46f5 100644
--- a/flink-yarn/pom.xml
+++ b/flink-yarn/pom.xml
@@ -181,6 +181,100 @@ under the License.
</dependency>
</dependencies>
+ <dependencyManagement>
+ <dependencies>
+ <dependency>
+ <!-- dependency convergence -->
+ <groupId>commons-beanutils</groupId>
+ <artifactId>commons-beanutils</artifactId>
+ <!-- Beanutils 1.9.+ doesn't work with Hadoop 2 -->
+ <version>1.8.3</version>
+ <scope>test</scope>
+ </dependency>
+ <dependency>
+ <!-- dependency convergence -->
+ <groupId>org.codehaus.woodstox</groupId>
+ <artifactId>stax2-api</artifactId>
+ <version>4.2.1</version>
+ <scope>test</scope>
+ </dependency>
+ </dependencies>
+ </dependencyManagement>
+
+ <profiles>
+ <profile>
+ <!-- Hadoop >= 2.6 moved the S3 file systems from hadoop-common into hadoop-aws artifact
+ (see https://issues.apache.org/jira/browse/HADOOP-11074)
+ We can add the (test) dependency per default once 2.6 is the minimum required version.
+ -->
+ <id>include_hadoop_aws</id>
+ <activation>
+ <property>
+ <name>include_hadoop_aws</name>
+ </property>
+ </activation>
+ <dependencies>
+ <!-- for the S3 tests of YarnFileStageTestS3ITCase -->
+ <dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>hadoop-aws</artifactId>
+ <version>${flink.hadoop.version}</version>
+ <scope>test</scope>
+ <exclusions>
+ <exclusion>
+ <groupId>log4j</groupId>
+ <artifactId>log4j</artifactId>
+ </exclusion>
+ <exclusion>
+ <groupId>org.slf4j</groupId>
+ <artifactId>slf4j-log4j12</artifactId>
+ </exclusion>
+ <exclusion>
+ <groupId>org.apache.avro</groupId>
+ <artifactId>avro</artifactId>
+ </exclusion>
+ <!-- The aws-java-sdk-core requires jackson 2.6, but
+ hadoop pulls in 2.3 -->
+ <exclusion>
+ <groupId>com.fasterxml.jackson.core</groupId>
+ <artifactId>jackson-annotations</artifactId>
+ </exclusion>
+ <exclusion>
+ <groupId>com.fasterxml.jackson.core</groupId>
+ <artifactId>jackson-core</artifactId>
+ </exclusion>
+ <exclusion>
+ <groupId>com.fasterxml.jackson.core</groupId>
+ <artifactId>jackson-databind</artifactId>
+ </exclusion>
+ <exclusion>
+ <groupId>ch.qos.reload4j</groupId>
+ <artifactId>reload4j</artifactId>
+ </exclusion>
+ <exclusion>
+ <groupId>org.slf4j</groupId>
+ <artifactId>slf4j-reload4j</artifactId>
+ </exclusion>
+ </exclusions>
+ </dependency>
+ <!-- override Hadoop's default dependency on too low SDK versions that do not work
+ with our httpcomponents version when initialising the s3a file system -->
+ <dependency>
+ <groupId>com.amazonaws</groupId>
+ <artifactId>aws-java-sdk-s3</artifactId>
+ <version>${aws.sdk.version}</version>
+ <scope>test</scope>
+ </dependency>
+ <dependency>
+ <groupId>com.amazonaws</groupId>
+ <artifactId>aws-java-sdk-sts</artifactId>
+ <version>${aws.sdk.version}</version>
+ <scope>test</scope>
+ </dependency>
+ </dependencies>
+ </profile>
+ </profiles>
+
<build>
<plugins>
<plugin>
diff --git a/flink-yarn/src/main/java/org/apache/flink/yarn/configuration/YarnConfigOptions.java b/flink-yarn/src/main/java/org/apache/flink/yarn/configuration/YarnConfigOptions.java
index c7ec43bfd48..696bf429c56 100644
--- a/flink-yarn/src/main/java/org/apache/flink/yarn/configuration/YarnConfigOptions.java
+++ b/flink-yarn/src/main/java/org/apache/flink/yarn/configuration/YarnConfigOptions.java
@@ -194,7 +194,7 @@ public class YarnConfigOptions {
* unset yarn priority setting and use cluster default priority.
*
* @see <a
- * href="https://hadoop.apache.org/docs/r2.8.5/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html">YARN
+ * href="https://hadoop.apache.org/docs/r2.10.2/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html">YARN
* Capacity Scheduling Doc</a>
*/
public static final ConfigOption<Integer> APPLICATION_PRIORITY =
diff --git a/pom.xml b/pom.xml
index 05185314d2b..0b84fa0c918 100644
--- a/pom.xml
+++ b/pom.xml
@@ -107,7 +107,7 @@ under the License.
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
- <flink.hadoop.version>2.8.5</flink.hadoop.version>
+ <flink.hadoop.version>2.10.2</flink.hadoop.version>
<flink.XmxMax>3072m</flink.XmxMax>
<!-- XmxMax / forkCountITCase -->
<flink.XmxITCase>1536m</flink.XmxITCase>
@@ -170,11 +170,6 @@ under the License.
<minikdc.version>3.2.4</minikdc.version>
<hive.version>2.3.9</hive.version>
<orc.version>1.5.6</orc.version>
- <!--
- Hive 2.3.4 relies on Hadoop 2.7.2 and later versions.
- For Hadoop 2.7, the minor Hadoop version supported for flink-shaded-hadoop-2-uber is 2.7.5
- -->
- <hivemetastore.hadoop.version>2.7.5</hivemetastore.hadoop.version>
<japicmp.referenceVersion>1.16.0</japicmp.referenceVersion>
<japicmp.outputDir>tools/japicmp-output</japicmp.outputDir>
<spotless.version>2.27.1</spotless.version>
diff --git a/tools/azure-pipelines/build-apache-repo.yml b/tools/azure-pipelines/build-apache-repo.yml
index f84237a9bc0..d28d4ee0264 100644
--- a/tools/azure-pipelines/build-apache-repo.yml
+++ b/tools/azure-pipelines/build-apache-repo.yml
@@ -70,7 +70,7 @@ stages:
name: Default
e2e_pool_definition:
vmImage: 'ubuntu-20.04'
- environment: PROFILE="-Dflink.hadoop.version=2.8.5 -Dscala-2.12"
+ environment: PROFILE="-Dflink.hadoop.version=2.10.2 -Dscala-2.12"
run_end_to_end: false
container: flink-build-container
jdk: 8
@@ -114,7 +114,7 @@ stages:
vmImage: 'ubuntu-20.04'
e2e_pool_definition:
vmImage: 'ubuntu-20.04'
- environment: PROFILE="-Dflink.hadoop.version=2.8.5 -Dscala-2.12"
+ environment: PROFILE="-Dflink.hadoop.version=2.10.2 -Dscala-2.12"
run_end_to_end: true
container: flink-build-container
jdk: 8
@@ -125,7 +125,7 @@ stages:
name: Default
e2e_pool_definition:
vmImage: 'ubuntu-20.04'
- environment: PROFILE="-Dflink.hadoop.version=3.1.3 -Phadoop3-tests,hive3"
+ environment: PROFILE="-Dflink.hadoop.version=3.2.3 -Phadoop3-tests,hive3"
run_end_to_end: true
container: flink-build-container
jdk: 8
@@ -136,7 +136,7 @@ stages:
name: Default
e2e_pool_definition:
vmImage: 'ubuntu-20.04'
- environment: PROFILE="-Dflink.hadoop.version=2.8.5 -Dscala-2.12 -Djdk11 -Pjava11-target"
+ environment: PROFILE="-Dflink.hadoop.version=2.10.2 -Dscala-2.12 -Djdk11 -Pjava11-target"
run_end_to_end: true
container: flink-build-container
jdk: 11
@@ -147,7 +147,7 @@ stages:
name: Default
e2e_pool_definition:
vmImage: 'ubuntu-20.04'
- environment: PROFILE="-Dflink.hadoop.version=2.8.5 -Dscala-2.12 -Penable-adaptive-scheduler"
+ environment: PROFILE="-Dflink.hadoop.version=2.10.2 -Dscala-2.12 -Penable-adaptive-scheduler"
run_end_to_end: true
container: flink-build-container
jdk: 8
@@ -162,5 +162,5 @@ stages:
- template: build-python-wheels.yml
parameters:
stage_name: cron_python_wheels
- environment: PROFILE="-Dflink.hadoop.version=2.8.5 -Dscala-2.12"
+ environment: PROFILE="-Dflink.hadoop.version=2.10.2 -Dscala-2.12"
container: flink-build-container