You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/05/14 13:12:39 UTC
[GitHub] [flink] alpinegizmo commented on a change in pull request #12057: [FLINK-16210][docs] Extending the "Flink Architecture" section.

alpinegizmo commented on a change in pull request #12057:
URL: https://github.com/apache/flink/pull/12057#discussion_r425124863



##########
File path: docs/concepts/flink-architecture.md
##########
@@ -129,4 +167,108 @@ two main benefits:
 
 <img src="{{ site.baseurl }}/fig/slot_sharing.svg" alt="TaskManagers with shared Task Slots" class="offset" width="80%" />
 
+## Flink Application Execution
+
+A _Flink Application_ is any user program that spawns one or multiple Flink
+jobs from its ``main()`` method. The execution of these jobs can happen in a
+local JVM (``LocalEnvironment``) or on a remote setup of clusters with multiple
+machines (``RemoteEnvironment``). For each program, the
+[``ExecutionEnvironment``]({{ site.baseurl }}/api/java/) provides methods to
+control the job execution (e.g. setting the parallelism) and to interact with
+the outside world (see [Anatomy of a Flink Program]({{ site.baseurl
+}}/dev/api_concepts.html#anatomy-of-a-flink-program)).
+
+The jobs of a Flink Application can either be submitted to a long-running
+[Flink Session Cluster]({{ site.baseurl
+}}/concepts/glossary.html#flink-session-cluster), a dedicated [Flink Job
+Cluster]({{ site.baseurl }}/concepts/glossary.html#flink-job-cluster) or a
+[Flink Application Cluster]({{ site.baseurl
+}}/concepts/glossary.html#flink-application-cluster). The difference between
+these options is mainly related to the cluster’s lifecycle and to resource
+isolation guarantees.
+
+### Flink Session Cluster
+
+* **Cluster Lifecycle**: in a Flink Session Cluster, the client connects to a
+  pre-existing, long-running cluster that can accept multiple job submissions.
+  Even after all jobs are finished, the cluster (and the Flink Master) will
+  keep running until the session is manually stopped. The lifetime of a Flink
+  Session Cluster is therefore not bound to the lifetime of any Flink Job.
+
+* **Resource Isolation**: TaskManager slots are allocated by the
+  ResourceManager on job submission and released once the job is finished.
+  Because all jobs are sharing the same cluster, there is some competition for
+  cluster resources — like network bandwidth in the submit-job phase. One
+  limitation of this shared setup is that if one TaskManager crashes, then all
+  jobs that have tasks running on this worker will fail; in a similar way, if
+  some fatal error occurs on the Flink Master, it will affect all jobs running
+  in the cluster.
+
+* **Other considerations**: having a pre-existing cluster saves a considerable
+  amount of time applying for resources and starting TaskManagers. This is
+  important in scenarios where the execution time of jobs is very short and a
+  high startup time would negatively impact the end-to-end user experience — as
+  is the case with interactive analysis of short queries, where it is desirable
+  that jobs can quickly perform computations using existing resources.
+
+<div class="alert alert-info"> <strong>Note:</strong> Formerly, a Flink Session Cluster was also known as a Flink Cluster in <i>session mode</i>. </div>
+
+### Flink Job Cluster
+
+* **Cluster Lifecycle**: in a Flink Job Cluster, the available cluster manager
+  (like YARN or Kubernetes) is used to spin up a cluster for each submitted job
+  and this cluster is available to that job only. Here, the client first
+  requests resources from the cluster manager to start the Flink Master and
+  submits the job to the Dispatcher running inside this process. TaskManagers
+  are then lazily allocated based on the resource requirements of the job. Once
+  the job is finished, the Flink Job Cluster is torn down.
+
+* **Resource Isolation**: a fatal error in the Flink Master only ever affects
+  one job in a Flink Job Cluster.
+
+* **Other considerations**: because the ResourceManager has to apply and wait
+  for external resource management components to start the TaskManager
+  processes and allocate resources, Flink Job Clusters are more suited to large
+  jobs that are long-running, have high-stability requirements and are not
+  sensitive to higher startup times.
+
+<div class="alert alert-info"> <strong>Note:</strong> Formerly, a Flink Job Cluster was also known as a Flink Cluster in <i>job (or per-job) mode</i>. </div>
+
+### Flink Application Cluster
+
+* **Cluster Lifecycle**: a Flink Application Cluster is a dedicated Flink
+  cluster that only executes jobs from one Flink Application and where the
+  ``main()`` method runs on the cluster rather than the client. The job
+  submission is a one-step process: you don’t need to start a Flink cluster
+  first and then submit a job to the existing cluster session; instead, you
+  package your application logic and dependencies into a executable job JAR and
+  the cluster entrypoint ([ApplicationClusterEntryPoint]({{ site.baseurl
+  }}/api/java/index.html?org/apache/flink/container/entrypoint/StandaloneJobClusterEntryPoint.html))
+  is responsible for calling the ``main()`` method to extract the JobGraph.
+  This allows you to deploy a Flink Application like any other application on
+  Kubernetes, for example. The lifetime of a Flink Application Cluster is
+  therefore bound to the lifetime of the Flink Application.

Review comment:
       I think we have a terminology problem here. I've heard some folks using the term "application" to talk about the thing that evolves over time and as it gets deployed comes to life as a sequence of one flink job after another. I've also had it explained to me that "application cluster" is just updated terminology for "job cluster", and it means the same thing. And then this section is introducing yet another definition.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org