You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by ch...@apache.org on 2019/07/24 14:50:03 UTC

[flink] branch release-1.9 updated: [FLINK-12901][docs] Update Hadoop build instructions

This is an automated email from the ASF dual-hosted git repository.

chesnay pushed a commit to branch release-1.9
in repository https://gitbox.apache.org/repos/asf/flink.git


The following commit(s) were added to refs/heads/release-1.9 by this push:
     new f334dec  [FLINK-12901][docs] Update Hadoop build instructions
f334dec is described below

commit f334dec8057c6970b6437cf3618c1ec7b9297070
Author: Chesnay Schepler <ch...@apache.org>
AuthorDate: Wed Jul 24 16:49:23 2019 +0200

    [FLINK-12901][docs] Update Hadoop build instructions
---
 docs/flinkDev/building.md                        | 38 ++++++++++++++++--------
 docs/flinkDev/building.zh.md                     | 38 ++++++++++++++++--------
 docs/getting-started/tutorials/local_setup.md    |  4 +--
 docs/getting-started/tutorials/local_setup.zh.md |  4 +--
 4 files changed, 54 insertions(+), 30 deletions(-)

diff --git a/docs/flinkDev/building.md b/docs/flinkDev/building.md
index c78bb3b..ce2ebd0 100644
--- a/docs/flinkDev/building.md
+++ b/docs/flinkDev/building.md
@@ -56,8 +56,6 @@ To speed up the build you can skip tests, QA plugins, and JavaDocs:
 mvn clean install -DskipTests -Dfast
 {% endhighlight %}
 
-The default build adds a Flink-specific JAR for Hadoop 2, to allow using Flink with HDFS and YARN.
-
 ## Build PyFlink
 
 If you want to build a PyFlink package that can be used for pip installation, you need to build Flink jars first, as described in [Build Flink](##Build Flink).
@@ -97,28 +95,40 @@ mvn clean install
 
 ## Hadoop Versions
 
-{% info %} Most users do not need to do this manually. The [download page]({{ site.download_url }}) contains binary packages for common Hadoop versions.
+Flink has optional dependencies to HDFS and YARN which are both dependencies from [Apache Hadoop](http://hadoop.apache.org). There exist many different versions of Hadoop (from both the upstream project and the different Hadoop distributions). If you are using an incompatible combination of versions, exceptions may occur.
+
+Flink can be built against any Hadoop version >= 2.4.0, but depending on the version it may be a 1 or 2 step process.
 
-Flink has dependencies to HDFS and YARN which are both dependencies from [Apache Hadoop](http://hadoop.apache.org). There exist many different versions of Hadoop (from both the upstream project and the different Hadoop distributions). If you are using a wrong combination of versions, exceptions can occur.
+### Pre-bundled versions
 
-Hadoop is only supported from version 2.4.0 upwards.
-You can also specify a specific Hadoop version to build against:
+To build against Hadoop 2.4.1, 2.6.5, 2.7.5 or 2.8.3, it is sufficient to run (e.g., for version `2.6.5`):
 
 {% highlight bash %}
-mvn clean install -DskipTests -Dhadoop.version=2.6.1
+mvn clean install -DskipTests -Dhadoop.version=2.6.5
 {% endhighlight %}
 
-### Packaging Hadoop into the Flink distribution
-
-If you want to build a Flink distribution that has a shaded Hadoop pre-packaged in the lib folder you can use the `include-hadoop` profile to do so. You would build Flink as described above but include the profile:
+To package a shaded pre-packaged Hadoop jar into the distributions `/lib` directory, activate the `include-hadoop` profile`:
 
 {% highlight bash %}
 mvn clean install -DskipTests -Pinclude-hadoop
 {% endhighlight %}
 
-### Vendor-specific Versions
-To check the list of supported vendor versions, look in https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs?repo=cloudera
-To build Flink against a vendor specific Hadoop version, issue the following command:
+### Custom / Vendor-specific versions
+
+If you want to build against Hadoop version that is *NOT* 2.4.1, 2.6.5, 2.7.5 or 2.8.3,
+then it is first necessary to build [flink-shaded](https://github.com/apache/flink-shaded) against this version.
+You can find the source for this project in the [Additional Components]({{ site.download_url }}#additional-components) section of the download page.
+
+Run the following command to build and install `flink-shaded` against your desired Hadoop version (e.g., for version `2.6.5-custom`):
+
+{% highlight bash %}
+mvn clean install -Dhadoop.version=2.6.5-custom
+{% endhighlight %}
+
+After this step is complete, follow the steps for [Pre-bundled versions](#pre-bundled-versions).
+
+To build Flink against a vendor specific Hadoop version, additionally activate -Pvendor-repos` profile when building
+`flink-shaded`.
 
 {% highlight bash %}
 mvn clean install -DskipTests -Pvendor-repos -Dhadoop.version=2.6.0-cdh5.16.1
@@ -126,6 +136,8 @@ mvn clean install -DskipTests -Pvendor-repos -Dhadoop.version=2.6.0-cdh5.16.1
 
 The `-Pvendor-repos` activates a Maven [build profile](http://maven.apache.org/guides/introduction/introduction-to-profiles.html) that includes the repositories of popular Hadoop vendors such as Cloudera, Hortonworks, or MapR.
 
+The list of supported vendor versions can be checked [here](https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs?repo=cloudera).
+
 {% top %}
 
 ## Scala Versions
diff --git a/docs/flinkDev/building.zh.md b/docs/flinkDev/building.zh.md
index aa70511..3ba2326 100644
--- a/docs/flinkDev/building.zh.md
+++ b/docs/flinkDev/building.zh.md
@@ -56,8 +56,6 @@ To speed up the build you can skip tests, QA plugins, and JavaDocs:
 mvn clean install -DskipTests -Dfast
 {% endhighlight %}
 
-The default build adds a Flink-specific JAR for Hadoop 2, to allow using Flink with HDFS and YARN.
-
 ## 构建PyFlink
 
 如果您想构建一个可用于pip安装的PyFlink包,您需要先构建Flink的Jar包,如[构建Flink](##Build Flink)中所述。
@@ -97,28 +95,40 @@ mvn clean install
 
 ## Hadoop Versions
 
-{% info %} Most users do not need to do this manually. The [download page]({{ site.download_url }}) contains binary packages for common Hadoop versions.
+Flink has optional dependencies to HDFS and YARN which are both dependencies from [Apache Hadoop](http://hadoop.apache.org). There exist many different versions of Hadoop (from both the upstream project and the different Hadoop distributions). If you are using an incompatible combination of versions, exceptions may occur.
+
+Flink can be built against any Hadoop version >= 2.4.0, but depending on the version it may be a 1 or 2 step process.
 
-Flink has dependencies to HDFS and YARN which are both dependencies from [Apache Hadoop](http://hadoop.apache.org). There exist many different versions of Hadoop (from both the upstream project and the different Hadoop distributions). If you are using a wrong combination of versions, exceptions can occur.
+### Pre-bundled versions
 
-Hadoop is only supported from version 2.4.0 upwards.
-You can also specify a specific Hadoop version to build against:
+To build against Hadoop 2.4.1, 2.6.5, 2.7.5 or 2.8.3, it is sufficient to run (e.g., for version `2.6.5`):
 
 {% highlight bash %}
-mvn clean install -DskipTests -Dhadoop.version=2.6.1
+mvn clean install -DskipTests -Dhadoop.version=2.6.5
 {% endhighlight %}
 
-### Packaging Hadoop into the Flink distribution
-
-If you want to build a Flink distribution that has a shaded Hadoop pre-packaged in the lib folder you can use the `include-hadoop` profile to do so. You would build Flink as described above but include the profile:
+To package a shaded pre-packaged Hadoop jar into the distributions `/lib` directory, activate the `include-hadoop` profile`:
 
 {% highlight bash %}
 mvn clean install -DskipTests -Pinclude-hadoop
 {% endhighlight %}
 
-### Vendor-specific Versions
-To check the list of supported vendor versions, look in https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs?repo=cloudera
-To build Flink against a vendor specific Hadoop version, issue the following command:
+### Custom / Vendor-specific versions
+
+If you want to build against Hadoop version that is *NOT* 2.4.1, 2.6.5, 2.7.5 or 2.8.3,
+then it is first necessary to build [flink-shaded](https://github.com/apache/flink-shaded) against this version.
+You can find the source for this project in the [Additional Components]({{ site.download_url }}#additional-components) section of the download page.
+
+Run the following command to build and install `flink-shaded` against your desired Hadoop version (e.g., for version `2.6.5-custom`):
+
+{% highlight bash %}
+mvn clean install -Dhadoop.version=2.6.5-custom
+{% endhighlight %}
+
+After this step is complete, follow the steps for [Pre-bundled versions](#pre-bundled-versions).
+
+To build Flink against a vendor specific Hadoop version, additionally activate -Pvendor-repos` profile when building
+`flink-shaded`.
 
 {% highlight bash %}
 mvn clean install -DskipTests -Pvendor-repos -Dhadoop.version=2.6.0-cdh5.16.1
@@ -126,6 +136,8 @@ mvn clean install -DskipTests -Pvendor-repos -Dhadoop.version=2.6.0-cdh5.16.1
 
 The `-Pvendor-repos` activates a Maven [build profile](http://maven.apache.org/guides/introduction/introduction-to-profiles.html) that includes the repositories of popular Hadoop vendors such as Cloudera, Hortonworks, or MapR.
 
+The list of supported vendor versions can be checked [here](https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs?repo=cloudera).
+
 {% top %}
 
 ## Scala Versions
diff --git a/docs/getting-started/tutorials/local_setup.md b/docs/getting-started/tutorials/local_setup.md
index b916669..799d390 100644
--- a/docs/getting-started/tutorials/local_setup.md
+++ b/docs/getting-started/tutorials/local_setup.md
@@ -51,8 +51,8 @@ Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)
 
 <div data-lang="Download and Unpack" markdown="1">
 1. Download a binary from the [downloads page](http://flink.apache.org/downloads.html). You can pick
-   any Hadoop/Scala combination you like. If you plan to just use the local file system, any Hadoop
-   version will work fine.
+   any Scala variant you like. For certain features you may also have to download one of the pre-bundled Hadoop jars
+   and place them into the `/lib` directory.
 2. Go to the download directory.
 3. Unpack the downloaded archive.
 
diff --git a/docs/getting-started/tutorials/local_setup.zh.md b/docs/getting-started/tutorials/local_setup.zh.md
index 43cec15..e8ef56c 100644
--- a/docs/getting-started/tutorials/local_setup.zh.md
+++ b/docs/getting-started/tutorials/local_setup.zh.md
@@ -51,8 +51,8 @@ Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)
 
 <div data-lang="Download and Unpack" markdown="1">
 1. Download a binary from the [downloads page](http://flink.apache.org/downloads.html). You can pick
-   any Hadoop/Scala combination you like. If you plan to just use the local file system, any Hadoop
-   version will work fine.
+   any Scala variant you like. For certain features you may also have to download one of the pre-bundled Hadoop jars
+   and place them into the `/lib` directory.
 2. Go to the download directory.
 3. Unpack the downloaded archive.