You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/12/02 18:28:51 UTC

[GitHub] [flink] rmetzger commented on a change in pull request #14238: [FLINK-20347] Rework YARN documentation page

rmetzger commented on a change in pull request #14238:
URL: https://github.com/apache/flink/pull/14238#discussion_r534389195



##########
File path: docs/deployment/resource-providers/yarn.md
##########
@@ -0,0 +1,238 @@
+---
+title:  "Apache Hadoop YARN"
+nav-title: YARN
+nav-parent_id: resource_providers
+nav-pos: 4
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+## Getting Started
+
+This *Getting Started* section guides you through setting up a fully functional Flink Cluster on YARN.
+
+### Introduction
+
+[Apache Hadoop YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) is a resource provider popular with many data processing frameworks.
+Flink services are submitted to YARN's ResourceManager, which spawns containers on machines managed by YARN NodeManagers. Flink deploys its JobManager and TaskManager instances into such containers.
+
+Flink can dynamically allocate and de-allocate TaskManager resources depending on the number of processing slots required by the job(s) running on the JobManager.
+
+### Preparation
+
+This *Getting Started* section assumes a functional YARN environment, starting from version 2.4.1. YARN environments are provided most conveniently through services such as Amazon EMR, Google Cloud DataProc or products like Cloudera. [Manually setting up a YARN environment locally](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html) or [on a cluster](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html) is not recommended for following through this *Getting Started* tutorial. 
+
+- Make sure your YARN cluster is ready for accepting Flink applications by running `yarn top`. It should show no error messages.
+- Download a recent Flink distribution from the [download page]({{ site.download_url }}) and unpack it.
+- **Important** Make sure that the `HADOOP_CLASSPATH` environment variable is set up (it can be checked by running `echo $HADOOP_CLASSPATH`). If not, set it up using 
+
+{% highlight bash %}
+export HADOOP_CLASSPATH=`hadoop classpath`
+{% endhighlight %}
+
+### Starting a Flink Session on YARN
+
+Once you've made sure that the `HADOOP_CLASSPATH` environment variable is set, you can launch a Flink on YARN session, and submit an example job:
+
+{% highlight bash %}
+
+# we assume to be in the root directory of the unzipped Flink distribution
+
+# (0) export HADOOP_CLASSPATH
+export HADOOP_CLASSPATH=`hadoop classpath`
+
+# (1) Start YARN Session
+./bin/yarn-session.sh --detached
+
+# (2) You can now access the Flink Web Interface through the URL printed in the last lines of the command output, or through the YARN ResourceManager web UI.
+
+# (3) Submit example job
+./bin/flink run ./examples/streaming/TopSpeedWindowing.jar
+
+# (4) Stop YARN session (replace the application id based on the output of the yarn-session.sh command)
+echo "stop" | ./bin/yarn-session.sh -id application_XXXXX_XXX
+{% endhighlight %}
+
+Congratulations! You have successfully run a Flink application by deploying Flink on YARN.
+
+{% top %}
+
+## Deployment Modes Supported by Flink on YARN
+
+For production use, we recommend deploying Flink Applications in the [Per-job or Application Mode]({% link deployment/index.md %}#deployment-modes), as these modes provide a better isolation for the Applications.
+
+### Application Mode
+
+Application Mode will launch a Flink cluster on YARN, where the main() method of the application jar gets executed on the JobManager in YARN.
+The cluster will shut down as soon as the application has finished. You can manually stop the cluster using `yarn application -kill <ApplicationId>` or by cancelling the Flink job.
+
+{% highlight bash %}
+./bin/flink run-application -t yarn-application ./examples/streaming/TopSpeedWindowing.jar
+{% endhighlight %}
+
+
+Once an Application Mode cluster is deployed, you can interact with it for operations like cancelling or taking a savepoint.
+
+{% highlight bash %}
+# List running job on the cluster
+./bin/flink list -t yarn-application -Dyarn.application.id=application_XXXX_YY
+# Cancel running job
+./bin/flink cancel -t yarn-application -Dyarn.application.id=application_XXXX_YY <jobId>

Review comment:
       @kl0u @aljoscha is there any best practice here that we should advise users to use?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org