You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/01/23 13:43:40 UTC

[GitHub] [flink] zentol opened a new pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

zentol opened a new pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932
 
 
   Consolidates the hadoop documentation under ops/deployment/hadoop .
   
   Exporting the classpath is listed first as it is the easier of the 2 and applicable to most people with a fully running hadoop setup.
   
   Instructions for the `include-hadoop` profile has been removed, in an attempt to eliminate the notion that "Flink must be built against a specific hadoop version", when all you need do to is place the jars into /lib.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#issuecomment-577700244
 
 
   <!--
   Meta data
   Hash:fb34b16d701e65ee2edcaa082869af08b4ee0482 Status:PENDING URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4574 TriggerType:PUSH TriggerID:fb34b16d701e65ee2edcaa082869af08b4ee0482
   Hash:fb34b16d701e65ee2edcaa082869af08b4ee0482 Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/145769036 TriggerType:PUSH TriggerID:fb34b16d701e65ee2edcaa082869af08b4ee0482
   -->
   ## CI report:
   
   * fb34b16d701e65ee2edcaa082869af08b4ee0482 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/145769036) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4574) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#issuecomment-577700244
 
 
   <!--
   Meta data
   Hash:fb34b16d701e65ee2edcaa082869af08b4ee0482 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4574 TriggerType:PUSH TriggerID:fb34b16d701e65ee2edcaa082869af08b4ee0482
   Hash:fb34b16d701e65ee2edcaa082869af08b4ee0482 Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/145769036 TriggerType:PUSH TriggerID:fb34b16d701e65ee2edcaa082869af08b4ee0482
   -->
   ## CI report:
   
   * fb34b16d701e65ee2edcaa082869af08b4ee0482 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/145769036) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4574) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#discussion_r372385547
 
 

 ##########
 File path: docs/ops/deployment/hadoop.md
 ##########
 @@ -66,6 +71,29 @@ in the shell. Note that `hadoop` is the hadoop binary and that `classpath` is an
 
 Putting the Hadoop configuration in the same class path as the Hadoop libraries makes Flink pick up that configuration.
 
+### Adding Hadoop to /lib
+
+The Flink project releases Hadoop distributions for specific versions, that relocate or exclude several dependencies
+to reduce the risk of dependency clashes.
+These can be found on the [downloads]({{ site.download_url }}).
+For these versions it is sufficient to download the corresponding `Pre-bundled Hadoop` component and placing it in
 
 Review comment:
   ```suggestion
   For these versions it is sufficient to download the corresponding `Pre-bundled Hadoop` component and put it into
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#issuecomment-577700244
 
 
   <!--
   Meta data
   Hash:fb34b16d701e65ee2edcaa082869af08b4ee0482 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4574 TriggerType:PUSH TriggerID:fb34b16d701e65ee2edcaa082869af08b4ee0482
   Hash:fb34b16d701e65ee2edcaa082869af08b4ee0482 Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/145769036 TriggerType:PUSH TriggerID:fb34b16d701e65ee2edcaa082869af08b4ee0482
   Hash:ec0a27ab69d702038fb1d99fd063cca8801be6c5 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4669 TriggerType:PUSH TriggerID:ec0a27ab69d702038fb1d99fd063cca8801be6c5
   Hash:ec0a27ab69d702038fb1d99fd063cca8801be6c5 Status:CANCELED URL:https://travis-ci.com/flink-ci/flink/builds/146718880 TriggerType:PUSH TriggerID:ec0a27ab69d702038fb1d99fd063cca8801be6c5
   Hash:18362a771aca442dfc07fd646f4bd98724cdd2e6 Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/146886289 TriggerType:PUSH TriggerID:18362a771aca442dfc07fd646f4bd98724cdd2e6
   Hash:18362a771aca442dfc07fd646f4bd98724cdd2e6 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4707 TriggerType:PUSH TriggerID:18362a771aca442dfc07fd646f4bd98724cdd2e6
   -->
   ## CI report:
   
   * fb34b16d701e65ee2edcaa082869af08b4ee0482 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/145769036) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4574) 
   * ec0a27ab69d702038fb1d99fd063cca8801be6c5 Travis: [CANCELED](https://travis-ci.com/flink-ci/flink/builds/146718880) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4669) 
   * 18362a771aca442dfc07fd646f4bd98724cdd2e6 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/146886289) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4707) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zentol commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#discussion_r372816573
 
 

 ##########
 File path: docs/ops/deployment/hadoop.md
 ##########
 @@ -38,13 +38,18 @@ Referencing the HDFS configuration in the [Flink configuration]({{ site.baseurl
 
 Another way to provide the Hadoop configuration is to have it on the class path of the Flink process, see more details below.
 
-## Adding Hadoop Classpaths
+## Providing Hadoop classes
 
-The required classes to use Hadoop should be available in the `lib/` folder of the Flink installation
-(on all machines running Flink) unless Flink is built with [Hadoop shaded dependencies]({{ site.baseurl }}/flinkDev/building.html#pre-bundled-versions).
+In order to use Hadoop features (e.g., YARN, HDFS) it is ncessary to provide Flink with the required Hadoop classes,
+as these are not bundled by default.
 
-If putting the files into the directory is not possible, Flink also respects
-the `HADOOP_CLASSPATH` environment variable to add Hadoop jar files to the classpath.
+This can be done in 2 ways:
+* Adding the Hadoop classpath to Flink
 
 Review comment:
   Depends on the specific Hadoop distribution. The point is that user should first try the easy way, since building flink-shaded against custom/vendor hadoop versions is a) a pain and b) something we can't test for.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#issuecomment-577700244
 
 
   <!--
   Meta data
   Hash:fb34b16d701e65ee2edcaa082869af08b4ee0482 Status:PENDING URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4574 TriggerType:PUSH TriggerID:fb34b16d701e65ee2edcaa082869af08b4ee0482
   Hash:fb34b16d701e65ee2edcaa082869af08b4ee0482 Status:PENDING URL:https://travis-ci.com/flink-ci/flink/builds/145769036 TriggerType:PUSH TriggerID:fb34b16d701e65ee2edcaa082869af08b4ee0482
   -->
   ## CI report:
   
   * fb34b16d701e65ee2edcaa082869af08b4ee0482 Travis: [PENDING](https://travis-ci.com/flink-ci/flink/builds/145769036) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4574) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zentol commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#discussion_r372445102
 
 

 ##########
 File path: docs/ops/deployment/hadoop.md
 ##########
 @@ -66,6 +71,29 @@ in the shell. Note that `hadoop` is the hadoop binary and that `classpath` is an
 
 Putting the Hadoop configuration in the same class path as the Hadoop libraries makes Flink pick up that configuration.
 
+### Adding Hadoop to /lib
+
+The Flink project releases Hadoop distributions for specific versions, that relocate or exclude several dependencies
+to reduce the risk of dependency clashes.
+These can be found on the [downloads]({{ site.download_url }}).
+For these versions it is sufficient to download the corresponding `Pre-bundled Hadoop` component and placing it in
+the `/lib` directory of the Flink distribution.
+
+If the used Hadoop version is not listed on the download page (possibly due to being a Vendor-specific version),
+then it is necessary to build [flink-shaded](https://github.com/apache/flink-shaded) against this version.
+You can find the source for this project in the [Additional Components]({{ site.download_url }}#additional-components) section of the download page.
+
+<span class="label label-info">Note</span> If you want to build `flink-shaded` against a vendor specific Hadoop version, you first have to configure the
+vendor-specific maven repository in your local maven setup as described [here](https://maven.apache.org/guides/mini/guide-multiple-repositories.html).
+
+Run the following command to build and install `flink-shaded` against your desired Hadoop version (e.g., for version `2.6.5-custom`):
+
+{% highlight bash %}
+mvn clean install -Dhadoop.version=2.6.5-custom
+{% endhighlight %}
+
+After this step is complete, place the `flink-shaded-hadoop-2-uber` jar in the `/lib` directory of the Flink distribution.
 
 Review comment:
   not reasonably possible since the path contains the shaded-version which is independent of Flink.
   
   If someone doesn't know which jar he should take he most likely didn't make it this far into the process anyway.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#issuecomment-577700244
 
 
   <!--
   Meta data
   Hash:fb34b16d701e65ee2edcaa082869af08b4ee0482 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4574 TriggerType:PUSH TriggerID:fb34b16d701e65ee2edcaa082869af08b4ee0482
   Hash:fb34b16d701e65ee2edcaa082869af08b4ee0482 Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/145769036 TriggerType:PUSH TriggerID:fb34b16d701e65ee2edcaa082869af08b4ee0482
   Hash:ec0a27ab69d702038fb1d99fd063cca8801be6c5 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4669 TriggerType:PUSH TriggerID:ec0a27ab69d702038fb1d99fd063cca8801be6c5
   Hash:ec0a27ab69d702038fb1d99fd063cca8801be6c5 Status:PENDING URL:https://travis-ci.com/flink-ci/flink/builds/146718880 TriggerType:PUSH TriggerID:ec0a27ab69d702038fb1d99fd063cca8801be6c5
   -->
   ## CI report:
   
   * fb34b16d701e65ee2edcaa082869af08b4ee0482 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/145769036) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4574) 
   * ec0a27ab69d702038fb1d99fd063cca8801be6c5 Travis: [PENDING](https://travis-ci.com/flink-ci/flink/builds/146718880) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4669) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#issuecomment-577700244
 
 
   <!--
   Meta data
   Hash:fb34b16d701e65ee2edcaa082869af08b4ee0482 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4574 TriggerType:PUSH TriggerID:fb34b16d701e65ee2edcaa082869af08b4ee0482
   Hash:fb34b16d701e65ee2edcaa082869af08b4ee0482 Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/145769036 TriggerType:PUSH TriggerID:fb34b16d701e65ee2edcaa082869af08b4ee0482
   Hash:ec0a27ab69d702038fb1d99fd063cca8801be6c5 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4669 TriggerType:PUSH TriggerID:ec0a27ab69d702038fb1d99fd063cca8801be6c5
   Hash:ec0a27ab69d702038fb1d99fd063cca8801be6c5 Status:PENDING URL:https://travis-ci.com/flink-ci/flink/builds/146718880 TriggerType:PUSH TriggerID:ec0a27ab69d702038fb1d99fd063cca8801be6c5
   Hash:18362a771aca442dfc07fd646f4bd98724cdd2e6 Status:UNKNOWN URL:TBD TriggerType:PUSH TriggerID:18362a771aca442dfc07fd646f4bd98724cdd2e6
   -->
   ## CI report:
   
   * fb34b16d701e65ee2edcaa082869af08b4ee0482 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/145769036) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4574) 
   * ec0a27ab69d702038fb1d99fd063cca8801be6c5 Travis: [PENDING](https://travis-ci.com/flink-ci/flink/builds/146718880) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4669) 
   * 18362a771aca442dfc07fd646f4bd98724cdd2e6 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#discussion_r372393568
 
 

 ##########
 File path: docs/ops/deployment/hadoop.md
 ##########
 @@ -66,6 +71,29 @@ in the shell. Note that `hadoop` is the hadoop binary and that `classpath` is an
 
 Putting the Hadoop configuration in the same class path as the Hadoop libraries makes Flink pick up that configuration.
 
+### Adding Hadoop to /lib
+
+The Flink project releases Hadoop distributions for specific versions, that relocate or exclude several dependencies
+to reduce the risk of dependency clashes.
+These can be found on the [downloads]({{ site.download_url }}).
+For these versions it is sufficient to download the corresponding `Pre-bundled Hadoop` component and placing it in
+the `/lib` directory of the Flink distribution.
+
+If the used Hadoop version is not listed on the download page (possibly due to being a Vendor-specific version),
+then it is necessary to build [flink-shaded](https://github.com/apache/flink-shaded) against this version.
+You can find the source for this project in the [Additional Components]({{ site.download_url }}#additional-components) section of the download page.
+
+<span class="label label-info">Note</span> If you want to build `flink-shaded` against a vendor specific Hadoop version, you first have to configure the
+vendor-specific maven repository in your local maven setup as described [here](https://maven.apache.org/guides/mini/guide-multiple-repositories.html).
+
+Run the following command to build and install `flink-shaded` against your desired Hadoop version (e.g., for version `2.6.5-custom`):
+
+{% highlight bash %}
+mvn clean install -Dhadoop.version=2.6.5-custom
+{% endhighlight %}
+
+After this step is complete, place the `flink-shaded-hadoop-2-uber` jar in the `/lib` directory of the Flink distribution.
 
 Review comment:
   ```suggestion
   After this step is complete, put the `flink-shaded-hadoop-2-uber` jar into the `/lib` directory of the Flink distribution.
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#discussion_r374695852
 
 

 ##########
 File path: docs/ops/deployment/hadoop.zh.md
 ##########
 @@ -38,13 +38,22 @@ Referencing the HDFS configuration in the [Flink configuration]({{ site.baseurl
 
 Another way to provide the Hadoop configuration is to have it on the class path of the Flink process, see more details below.
 
-## Adding Hadoop Classpaths
+## Providing Hadoop classes
 
-The required classes to use Hadoop should be available in the `lib/` folder of the Flink installation
-(on all machines running Flink) unless Flink is built with [Hadoop shaded dependencies]({{ site.baseurl }}/flinkDev/building.html#pre-bundled-versions).
+In order to use Hadoop features (e.g., YARN, HDFS) it is necessary to provide Flink with the required Hadoop classes,
+as these are not bundled by default.
 
-If putting the files into the directory is not possible, Flink also respects
-the `HADOOP_CLASSPATH` environment variable to add Hadoop jar files to the classpath.
+This can be done by 
+1) Adding the Hadoop classpath to Flink
+2) Putting the required jar files into /lib directory of the Flink distribution
+Option 1) requires very little work and integrates nicely with existing Hadoop setups, and should be the
 
 Review comment:
   ```suggestion
   Option 1) requires very little work and integrates nicely with existing Hadoop setups. It should be the
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#issuecomment-577700244
 
 
   <!--
   Meta data
   Hash:fb34b16d701e65ee2edcaa082869af08b4ee0482 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4574 TriggerType:PUSH TriggerID:fb34b16d701e65ee2edcaa082869af08b4ee0482
   Hash:fb34b16d701e65ee2edcaa082869af08b4ee0482 Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/145769036 TriggerType:PUSH TriggerID:fb34b16d701e65ee2edcaa082869af08b4ee0482
   Hash:ec0a27ab69d702038fb1d99fd063cca8801be6c5 Status:PENDING URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4669 TriggerType:PUSH TriggerID:ec0a27ab69d702038fb1d99fd063cca8801be6c5
   Hash:ec0a27ab69d702038fb1d99fd063cca8801be6c5 Status:PENDING URL:https://travis-ci.com/flink-ci/flink/builds/146718880 TriggerType:PUSH TriggerID:ec0a27ab69d702038fb1d99fd063cca8801be6c5
   -->
   ## CI report:
   
   * fb34b16d701e65ee2edcaa082869af08b4ee0482 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/145769036) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4574) 
   * ec0a27ab69d702038fb1d99fd063cca8801be6c5 Travis: [PENDING](https://travis-ci.com/flink-ci/flink/builds/146718880) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4669) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#discussion_r372393162
 
 

 ##########
 File path: docs/ops/deployment/hadoop.md
 ##########
 @@ -38,13 +38,18 @@ Referencing the HDFS configuration in the [Flink configuration]({{ site.baseurl
 
 Another way to provide the Hadoop configuration is to have it on the class path of the Flink process, see more details below.
 
-## Adding Hadoop Classpaths
+## Providing Hadoop classes
 
-The required classes to use Hadoop should be available in the `lib/` folder of the Flink installation
-(on all machines running Flink) unless Flink is built with [Hadoop shaded dependencies]({{ site.baseurl }}/flinkDev/building.html#pre-bundled-versions).
+In order to use Hadoop features (e.g., YARN, HDFS) it is ncessary to provide Flink with the required Hadoop classes,
+as these are not bundled by default.
 
-If putting the files into the directory is not possible, Flink also respects
-the `HADOOP_CLASSPATH` environment variable to add Hadoop jar files to the classpath.
+This can be done in 2 ways:
+* Adding the Hadoop classpath to Flink
 
 Review comment:
   Are there no expected dependency clashes in case of just exporting `HADOOP_CLASSPATH`?
   In other words, why is the relocation needed for `/lib` but not for `HADOOP_CLASSPATH`?
   
   I also somewhat liked the previous idea of mentioning that the first way should be a recommended way to go and only in case of problems (giving examples) go to the option 2. Is it still the case?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#issuecomment-577700244
 
 
   <!--
   Meta data
   Hash:fb34b16d701e65ee2edcaa082869af08b4ee0482 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4574 TriggerType:PUSH TriggerID:fb34b16d701e65ee2edcaa082869af08b4ee0482
   Hash:fb34b16d701e65ee2edcaa082869af08b4ee0482 Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/145769036 TriggerType:PUSH TriggerID:fb34b16d701e65ee2edcaa082869af08b4ee0482
   Hash:ec0a27ab69d702038fb1d99fd063cca8801be6c5 Status:UNKNOWN URL:TBD TriggerType:PUSH TriggerID:ec0a27ab69d702038fb1d99fd063cca8801be6c5
   -->
   ## CI report:
   
   * fb34b16d701e65ee2edcaa082869af08b4ee0482 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/145769036) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4574) 
   * ec0a27ab69d702038fb1d99fd063cca8801be6c5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#discussion_r372386932
 
 

 ##########
 File path: docs/ops/deployment/hadoop.md
 ##########
 @@ -66,6 +71,29 @@ in the shell. Note that `hadoop` is the hadoop binary and that `classpath` is an
 
 Putting the Hadoop configuration in the same class path as the Hadoop libraries makes Flink pick up that configuration.
 
+### Adding Hadoop to /lib
+
+The Flink project releases Hadoop distributions for specific versions, that relocate or exclude several dependencies
+to reduce the risk of dependency clashes.
+These can be found on the [downloads]({{ site.download_url }}).
+For these versions it is sufficient to download the corresponding `Pre-bundled Hadoop` component and placing it in
+the `/lib` directory of the Flink distribution.
+
+If the used Hadoop version is not listed on the download page (possibly due to being a Vendor-specific version),
+then it is necessary to build [flink-shaded](https://github.com/apache/flink-shaded) against this version.
+You can find the source for this project in the [Additional Components]({{ site.download_url }}#additional-components) section of the download page.
 
 Review comment:
   ```suggestion
   You can also find the source code for this project in the [Additional Components]({{ site.download_url }}#additional-components) section of the download page.
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#discussion_r374696113
 
 

 ##########
 File path: docs/ops/deployment/hadoop.md
 ##########
 @@ -38,13 +38,22 @@ Referencing the HDFS configuration in the [Flink configuration]({{ site.baseurl
 
 Another way to provide the Hadoop configuration is to have it on the class path of the Flink process, see more details below.
 
-## Adding Hadoop Classpaths
+## Providing Hadoop classes
 
-The required classes to use Hadoop should be available in the `lib/` folder of the Flink installation
-(on all machines running Flink) unless Flink is built with [Hadoop shaded dependencies]({{ site.baseurl }}/flinkDev/building.html#pre-bundled-versions).
+In order to use Hadoop features (e.g., YARN, HDFS) it is necessary to provide Flink with the required Hadoop classes,
+as these are not bundled by default.
 
-If putting the files into the directory is not possible, Flink also respects
-the `HADOOP_CLASSPATH` environment variable to add Hadoop jar files to the classpath.
+This can be done by 
+1) Adding the Hadoop classpath to Flink
+2) Putting the required jar files into /lib directory of the Flink distribution
+Option 1) requires very little work and integrates nicely with existing Hadoop setups, and should be the
 
 Review comment:
   ```suggestion
   Option 1) requires very little work and integrates nicely with existing Hadoop setups. It should be the
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#discussion_r372382695
 
 

 ##########
 File path: docs/ops/deployment/hadoop.md
 ##########
 @@ -38,13 +38,18 @@ Referencing the HDFS configuration in the [Flink configuration]({{ site.baseurl
 
 Another way to provide the Hadoop configuration is to have it on the class path of the Flink process, see more details below.
 
-## Adding Hadoop Classpaths
+## Providing Hadoop classes
 
-The required classes to use Hadoop should be available in the `lib/` folder of the Flink installation
-(on all machines running Flink) unless Flink is built with [Hadoop shaded dependencies]({{ site.baseurl }}/flinkDev/building.html#pre-bundled-versions).
+In order to use Hadoop features (e.g., YARN, HDFS) it is ncessary to provide Flink with the required Hadoop classes,
 
 Review comment:
   ```suggestion
   In order to use Hadoop features (e.g., YARN, HDFS) it is necessary to provide Flink with the required Hadoop classes,
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#discussion_r372386025
 
 

 ##########
 File path: docs/ops/deployment/hadoop.md
 ##########
 @@ -66,6 +71,29 @@ in the shell. Note that `hadoop` is the hadoop binary and that `classpath` is an
 
 Putting the Hadoop configuration in the same class path as the Hadoop libraries makes Flink pick up that configuration.
 
+### Adding Hadoop to /lib
+
+The Flink project releases Hadoop distributions for specific versions, that relocate or exclude several dependencies
+to reduce the risk of dependency clashes.
+These can be found on the [downloads]({{ site.download_url }}).
+For these versions it is sufficient to download the corresponding `Pre-bundled Hadoop` component and placing it in
+the `/lib` directory of the Flink distribution.
+
+If the used Hadoop version is not listed on the download page (possibly due to being a Vendor-specific version),
+then it is necessary to build [flink-shaded](https://github.com/apache/flink-shaded) against this version.
 
 Review comment:
   ```suggestion
   then it is necessary to build [flink-shaded](https://github.com/apache/flink-shaded) dependency against this version.
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#discussion_r374696315
 
 

 ##########
 File path: docs/ops/deployment/hadoop.md
 ##########
 @@ -38,13 +38,22 @@ Referencing the HDFS configuration in the [Flink configuration]({{ site.baseurl
 
 Another way to provide the Hadoop configuration is to have it on the class path of the Flink process, see more details below.
 
-## Adding Hadoop Classpaths
+## Providing Hadoop classes
 
-The required classes to use Hadoop should be available in the `lib/` folder of the Flink installation
-(on all machines running Flink) unless Flink is built with [Hadoop shaded dependencies]({{ site.baseurl }}/flinkDev/building.html#pre-bundled-versions).
+In order to use Hadoop features (e.g., YARN, HDFS) it is necessary to provide Flink with the required Hadoop classes,
+as these are not bundled by default.
 
-If putting the files into the directory is not possible, Flink also respects
-the `HADOOP_CLASSPATH` environment variable to add Hadoop jar files to the classpath.
+This can be done by 
+1) Adding the Hadoop classpath to Flink
+2) Putting the required jar files into /lib directory of the Flink distribution
+Option 1) requires very little work and integrates nicely with existing Hadoop setups, and should be the
+preferred approach.
+However, Hadoop has a large dependency footprint, increasing the risk of dependency conflicts occurring.
 
 Review comment:
   ```suggestion
   However, Hadoop has a large dependency footprint that increases the risk of dependency conflicts occurring.
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#issuecomment-577700244
 
 
   <!--
   Meta data
   Hash:fb34b16d701e65ee2edcaa082869af08b4ee0482 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4574 TriggerType:PUSH TriggerID:fb34b16d701e65ee2edcaa082869af08b4ee0482
   Hash:fb34b16d701e65ee2edcaa082869af08b4ee0482 Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/145769036 TriggerType:PUSH TriggerID:fb34b16d701e65ee2edcaa082869af08b4ee0482
   Hash:ec0a27ab69d702038fb1d99fd063cca8801be6c5 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4669 TriggerType:PUSH TriggerID:ec0a27ab69d702038fb1d99fd063cca8801be6c5
   Hash:ec0a27ab69d702038fb1d99fd063cca8801be6c5 Status:CANCELED URL:https://travis-ci.com/flink-ci/flink/builds/146718880 TriggerType:PUSH TriggerID:ec0a27ab69d702038fb1d99fd063cca8801be6c5
   Hash:18362a771aca442dfc07fd646f4bd98724cdd2e6 Status:PENDING URL:https://travis-ci.com/flink-ci/flink/builds/146886289 TriggerType:PUSH TriggerID:18362a771aca442dfc07fd646f4bd98724cdd2e6
   Hash:18362a771aca442dfc07fd646f4bd98724cdd2e6 Status:PENDING URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4707 TriggerType:PUSH TriggerID:18362a771aca442dfc07fd646f4bd98724cdd2e6
   -->
   ## CI report:
   
   * fb34b16d701e65ee2edcaa082869af08b4ee0482 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/145769036) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4574) 
   * ec0a27ab69d702038fb1d99fd063cca8801be6c5 Travis: [CANCELED](https://travis-ci.com/flink-ci/flink/builds/146718880) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4669) 
   * 18362a771aca442dfc07fd646f4bd98724cdd2e6 Travis: [PENDING](https://travis-ci.com/flink-ci/flink/builds/146886289) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4707) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zentol commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#discussion_r372813631
 
 

 ##########
 File path: docs/ops/deployment/hadoop.md
 ##########
 @@ -66,6 +71,29 @@ in the shell. Note that `hadoop` is the hadoop binary and that `classpath` is an
 
 Putting the Hadoop configuration in the same class path as the Hadoop libraries makes Flink pick up that configuration.
 
+### Adding Hadoop to /lib
+
+The Flink project releases Hadoop distributions for specific versions, that relocate or exclude several dependencies
+to reduce the risk of dependency clashes.
+These can be found on the [downloads]({{ site.download_url }}).
+For these versions it is sufficient to download the corresponding `Pre-bundled Hadoop` component and placing it in
+the `/lib` directory of the Flink distribution.
+
+If the used Hadoop version is not listed on the download page (possibly due to being a Vendor-specific version),
+then it is necessary to build [flink-shaded](https://github.com/apache/flink-shaded) against this version.
+You can find the source for this project in the [Additional Components]({{ site.download_url }}#additional-components) section of the download page.
 
 Review comment:
   The website is the primary source for the projects source code and hence shouldn't be preceded by "also". It would imply that users _alternatively_ may download things from the website.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#discussion_r372401274
 
 

 ##########
 File path: docs/ops/deployment/hadoop.md
 ##########
 @@ -66,6 +71,29 @@ in the shell. Note that `hadoop` is the hadoop binary and that `classpath` is an
 
 Putting the Hadoop configuration in the same class path as the Hadoop libraries makes Flink pick up that configuration.
 
+### Adding Hadoop to /lib
+
+The Flink project releases Hadoop distributions for specific versions, that relocate or exclude several dependencies
+to reduce the risk of dependency clashes.
+These can be found on the [downloads]({{ site.download_url }}).
 
 Review comment:
   ```suggestion
   These can be found on the [downloads]({{ site.download_url }}) page in the optional components.
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zentol merged pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
zentol merged pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zentol commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#discussion_r372444225
 
 

 ##########
 File path: docs/ops/deployment/hadoop.md
 ##########
 @@ -66,6 +71,29 @@ in the shell. Note that `hadoop` is the hadoop binary and that `classpath` is an
 
 Putting the Hadoop configuration in the same class path as the Hadoop libraries makes Flink pick up that configuration.
 
+### Adding Hadoop to /lib
+
+The Flink project releases Hadoop distributions for specific versions, that relocate or exclude several dependencies
+to reduce the risk of dependency clashes.
+These can be found on the [downloads]({{ site.download_url }}).
+For these versions it is sufficient to download the corresponding `Pre-bundled Hadoop` component and placing it in
+the `/lib` directory of the Flink distribution.
+
+If the used Hadoop version is not listed on the download page (possibly due to being a Vendor-specific version),
+then it is necessary to build [flink-shaded](https://github.com/apache/flink-shaded) against this version.
 
 Review comment:
   Doesn't make sense to refer to flink-shaded as a whole as a dependency; it's not, only parts of it are.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#discussion_r374695852
 
 

 ##########
 File path: docs/ops/deployment/hadoop.zh.md
 ##########
 @@ -38,13 +38,22 @@ Referencing the HDFS configuration in the [Flink configuration]({{ site.baseurl
 
 Another way to provide the Hadoop configuration is to have it on the class path of the Flink process, see more details below.
 
-## Adding Hadoop Classpaths
+## Providing Hadoop classes
 
-The required classes to use Hadoop should be available in the `lib/` folder of the Flink installation
-(on all machines running Flink) unless Flink is built with [Hadoop shaded dependencies]({{ site.baseurl }}/flinkDev/building.html#pre-bundled-versions).
+In order to use Hadoop features (e.g., YARN, HDFS) it is necessary to provide Flink with the required Hadoop classes,
+as these are not bundled by default.
 
-If putting the files into the directory is not possible, Flink also respects
-the `HADOOP_CLASSPATH` environment variable to add Hadoop jar files to the classpath.
+This can be done by 
+1) Adding the Hadoop classpath to Flink
+2) Putting the required jar files into /lib directory of the Flink distribution
+Option 1) requires very little work and integrates nicely with existing Hadoop setups, and should be the
 
 Review comment:
   ```suggestion
   Option 1) requires very little work and integrates nicely with existing Hadoop setups. It should be the
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
flinkbot commented on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#issuecomment-577688150
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit fb34b16d701e65ee2edcaa082869af08b4ee0482 (Thu Jan 23 13:46:40 UTC 2020)
   
    ✅no warnings
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#issuecomment-577700244
 
 
   <!--
   Meta data
   Hash:fb34b16d701e65ee2edcaa082869af08b4ee0482 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4574 TriggerType:PUSH TriggerID:fb34b16d701e65ee2edcaa082869af08b4ee0482
   Hash:fb34b16d701e65ee2edcaa082869af08b4ee0482 Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/145769036 TriggerType:PUSH TriggerID:fb34b16d701e65ee2edcaa082869af08b4ee0482
   Hash:ec0a27ab69d702038fb1d99fd063cca8801be6c5 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4669 TriggerType:PUSH TriggerID:ec0a27ab69d702038fb1d99fd063cca8801be6c5
   Hash:ec0a27ab69d702038fb1d99fd063cca8801be6c5 Status:CANCELED URL:https://travis-ci.com/flink-ci/flink/builds/146718880 TriggerType:PUSH TriggerID:ec0a27ab69d702038fb1d99fd063cca8801be6c5
   Hash:18362a771aca442dfc07fd646f4bd98724cdd2e6 Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/146886289 TriggerType:PUSH TriggerID:18362a771aca442dfc07fd646f4bd98724cdd2e6
   Hash:18362a771aca442dfc07fd646f4bd98724cdd2e6 Status:PENDING URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4707 TriggerType:PUSH TriggerID:18362a771aca442dfc07fd646f4bd98724cdd2e6
   -->
   ## CI report:
   
   * fb34b16d701e65ee2edcaa082869af08b4ee0482 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/145769036) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4574) 
   * ec0a27ab69d702038fb1d99fd063cca8801be6c5 Travis: [CANCELED](https://travis-ci.com/flink-ci/flink/builds/146718880) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4669) 
   * 18362a771aca442dfc07fd646f4bd98724cdd2e6 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/146886289) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4707) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
flinkbot commented on issue #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#issuecomment-577700244
 
 
   <!--
   Meta data
   Hash:fb34b16d701e65ee2edcaa082869af08b4ee0482 Status:UNKNOWN URL:TBD TriggerType:PUSH TriggerID:fb34b16d701e65ee2edcaa082869af08b4ee0482
   -->
   ## CI report:
   
   * fb34b16d701e65ee2edcaa082869af08b4ee0482 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#discussion_r372394242
 
 

 ##########
 File path: docs/ops/deployment/hadoop.md
 ##########
 @@ -66,6 +71,29 @@ in the shell. Note that `hadoop` is the hadoop binary and that `classpath` is an
 
 Putting the Hadoop configuration in the same class path as the Hadoop libraries makes Flink pick up that configuration.
 
+### Adding Hadoop to /lib
+
+The Flink project releases Hadoop distributions for specific versions, that relocate or exclude several dependencies
+to reduce the risk of dependency clashes.
+These can be found on the [downloads]({{ site.download_url }}).
+For these versions it is sufficient to download the corresponding `Pre-bundled Hadoop` component and placing it in
+the `/lib` directory of the Flink distribution.
+
+If the used Hadoop version is not listed on the download page (possibly due to being a Vendor-specific version),
+then it is necessary to build [flink-shaded](https://github.com/apache/flink-shaded) against this version.
+You can find the source for this project in the [Additional Components]({{ site.download_url }}#additional-components) section of the download page.
+
+<span class="label label-info">Note</span> If you want to build `flink-shaded` against a vendor specific Hadoop version, you first have to configure the
+vendor-specific maven repository in your local maven setup as described [here](https://maven.apache.org/guides/mini/guide-multiple-repositories.html).
+
+Run the following command to build and install `flink-shaded` against your desired Hadoop version (e.g., for version `2.6.5-custom`):
+
+{% highlight bash %}
+mvn clean install -Dhadoop.version=2.6.5-custom
+{% endhighlight %}
+
+After this step is complete, place the `flink-shaded-hadoop-2-uber` jar in the `/lib` directory of the Flink distribution.
 
 Review comment:
   I would also suggest to have an expected path to the jar, like `target/flink-shaded-hadoop-2-uber.jar` for less advanced users.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zentol commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #10932: [FLINK-15614][docs] Consolidate Hadoop documentation
URL: https://github.com/apache/flink/pull/10932#discussion_r372813631
 
 

 ##########
 File path: docs/ops/deployment/hadoop.md
 ##########
 @@ -66,6 +71,29 @@ in the shell. Note that `hadoop` is the hadoop binary and that `classpath` is an
 
 Putting the Hadoop configuration in the same class path as the Hadoop libraries makes Flink pick up that configuration.
 
+### Adding Hadoop to /lib
+
+The Flink project releases Hadoop distributions for specific versions, that relocate or exclude several dependencies
+to reduce the risk of dependency clashes.
+These can be found on the [downloads]({{ site.download_url }}).
+For these versions it is sufficient to download the corresponding `Pre-bundled Hadoop` component and placing it in
+the `/lib` directory of the Flink distribution.
+
+If the used Hadoop version is not listed on the download page (possibly due to being a Vendor-specific version),
+then it is necessary to build [flink-shaded](https://github.com/apache/flink-shaded) against this version.
+You can find the source for this project in the [Additional Components]({{ site.download_url }}#additional-components) section of the download page.
 
 Review comment:
   The website is the primary source for the projects source code and hence shouldn't be preceded by "also". It would imply that users _alternatively_ download things from the website.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services