You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by mb...@apache.org on 2022/06/30 10:02:13 UTC

[flink-web] 01/02: [FLINK-22352] Removing mesos references from docs

This is an automated email from the ASF dual-hosted git repository.

mbalassi pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/flink-web.git

commit 815238d18a089673239be97fff9343ae63307979
Author: Dóra Marsal <do...@gmail.com>
AuthorDate: Wed Jun 29 20:36:05 2022 +0200

    [FLINK-22352] Removing mesos references from docs
    
    Closes #557
---
 flink-architecture.md      |  14 +++++++-------
 flink-operations.md        |   6 +++---
 img/flink-home-graphic.png | Bin 495083 -> 448884 bytes
 usecases.md                |  17 ++++++++---------
 4 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/flink-architecture.md b/flink-architecture.md
index 08e9812aa..abc407cb4 100644
--- a/flink-architecture.md
+++ b/flink-architecture.md
@@ -17,9 +17,9 @@ Here, we explain important aspects of Flink's architecture.
 
 ## Process Unbounded and Bounded Data
 
-Any kind of data is produced as a stream of events. Credit card transactions, sensor measurements, machine logs, or user interactions on a website or mobile application, all of these data are generated as a stream. 
+Any kind of data is produced as a stream of events. Credit card transactions, sensor measurements, machine logs, or user interactions on a website or mobile application, all of these data are generated as a stream.
 
-Data can be processed as *unbounded* or *bounded* streams. 
+Data can be processed as *unbounded* or *bounded* streams.
 
 1. **Unbounded streams** have a start but no defined end. They do not terminate and provide data as it is generated. Unbounded streams must be continuously processed, i.e., events must be promptly handled after they have been ingested. It is not possible to wait for all input data to arrive because the input is unbounded and will not be complete at any point in time. Processing unbounded data often requires that events are ingested in a specific order, such as the order in which events o [...]
 
@@ -29,17 +29,17 @@ Data can be processed as *unbounded* or *bounded* streams.
   <img src="{{ site.baseurl }}/img/bounded-unbounded.png" width="600px" />
 </div>
 
-**Apache Flink excels at processing unbounded and bounded data sets.** Precise control of time and state enable Flink's runtime to run any kind of application on unbounded streams. Bounded streams are internally processed by algorithms and data structures that are specifically designed for fixed sized data sets, yielding excellent performance. 
+**Apache Flink excels at processing unbounded and bounded data sets.** Precise control of time and state enable Flink's runtime to run any kind of application on unbounded streams. Bounded streams are internally processed by algorithms and data structures that are specifically designed for fixed sized data sets, yielding excellent performance.
 
 Convince yourself by exploring the [use cases]({{ site.baseurl }}/usecases.html) that have been built on top of Flink.
 
 ## Deploy Applications Anywhere
 
-Apache Flink is a distributed system and requires compute resources in order to execute applications. Flink integrates with all common cluster resource managers such as [Hadoop YARN](https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html), [Apache Mesos](https://mesos.apache.org), and [Kubernetes](https://kubernetes.io/) but can also be setup to run as a stand-alone cluster.
+Apache Flink is a distributed system and requires compute resources in order to execute applications. Flink integrates with all common cluster resource managers such as [Hadoop YARN](https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html) and [Kubernetes](https://kubernetes.io/), but can also be setup to run as a stand-alone cluster.
 
-Flink is designed to work well each of the previously listed resource managers. This is achieved by resource-manager-specific deployment modes that allow Flink to interact with each resource manager in its idiomatic way. 
+Flink is designed to work well each of the previously listed resource managers. This is achieved by resource-manager-specific deployment modes that allow Flink to interact with each resource manager in its idiomatic way.
 
-When deploying a Flink application, Flink automatically identifies the required resources based on the application's configured parallelism and requests them from the resource manager. In case of a failure, Flink replaces the failed container by requesting new resources. All communication to submit or control an application happens via REST calls. This eases the integration of Flink in many environments. 
+When deploying a Flink application, Flink automatically identifies the required resources based on the application's configured parallelism and requests them from the resource manager. In case of a failure, Flink replaces the failed container by requesting new resources. All communication to submit or control an application happens via REST calls. This eases the integration of Flink in many environments.
 
 <!-- Add this section once library deployment mode is supported. -->
 <!--
@@ -48,7 +48,7 @@ Flink features two deployment modes for applications, the *framework mode* and t
 
 * In the **framework deployment mode**, a client submits a Flink application against a running Flink service that takes care of executing the application. This is the common deployment model for most data processing frameworks, query engines, or database systems.
 
-* In the **library deployment mode**, a Flink application is packaged together with the Flink master executables into a (Docker) image. Another job-independent image contains the Flink worker executables. When a container is started from the job image, the Flink master process is started and the embedded application is automatically loaded. Containers started from the worker image, bootstrap Flink worker processes which automatically connect to the master process. A container manager suc [...]
+* In the **library deployment mode**, a Flink application is packaged together with the Flink master executables into a (Docker) image. Another job-independent image contains the Flink worker executables. When a container is started from the job image, the Flink master process is started and the embedded application is automatically loaded. Containers started from the worker image, bootstrap Flink worker processes which automatically connect to the master process. A container manager suc [...]
 
 <div class="row front-graphic">
   <img src="{{ site.baseurl }}/img/deployment-modes.png" width="600px" />
diff --git a/flink-operations.md b/flink-operations.md
index b0f8addb0..987260cb8 100644
--- a/flink-operations.md
+++ b/flink-operations.md
@@ -18,7 +18,7 @@ Flink provides several features to ensure that applications keep running and rem
 * **Consistent Checkpoints**: Flink's recovery mechanism is based on consistent checkpoints of an application's state. In case of a failure, the application is restarted and its state is loaded from the latest checkpoint. In combination with resettable stream sources, this feature can guarantee *exactly-once state consistency*.
 * **Efficient Checkpoints**: Checkpointing the state of an application can be quite expensive if the application maintains terabytes of state. Flink's can perform asynchronous and incremental checkpoints, in order to keep the impact of checkpoints on the application's latency SLAs very small.
 * **End-to-End Exactly-Once**: Flink features transactional sinks for specific storage systems that guarantee that data is only written out exactly once, even in case of failures.
-* **Integration with Cluster Managers**: Flink is tightly integrated with cluster managers, such as [Hadoop YARN](https://hadoop.apache.org), [Mesos](https://mesos.apache.org), or [Kubernetes](https://kubernetes.io). When a process fails, a new process is automatically started to take over its work. 
+* **Integration with Cluster Managers**: Flink is tightly integrated with cluster managers, such as [Hadoop YARN](https://hadoop.apache.org) or [Kubernetes](https://kubernetes.io). When a process fails, a new process is automatically started to take over its work. 
 * **High-Availability Setup**: Flink feature a high-availability mode that eliminates all single-points-of-failure. The HA-mode is based on [Apache ZooKeeper](https://zookeeper.apache.org), a battle-proven service for reliable distributed coordination.
 
 ## Update, Migrate, Suspend, and Resume Your Applications
@@ -31,7 +31,7 @@ Flink's *Savepoints* are a unique and powerful feature that solves the issue of
 * **Cluster Migration**: Using savepoints, applications can be migrated (or cloned) to different clusters.
 * **Flink Version Updates**: An application can be migrated to run on a new Flink version using a savepoint.
 * **Application Scaling**: Savepoints can be used to increase or decrease the parallelism of an application.
-* **A/B Tests and What-If Scenarios**: The performance or quality of two (or more) different versions of an application can be compared by starting all versions from the same savepoint. 
+* **A/B Tests and What-If Scenarios**: The performance or quality of two (or more) different versions of an application can be compared by starting all versions from the same savepoint.
 * **Pause and Resume**: An application can be paused by taking a savepoint and stopping it. At any later point in time, the application can be resumed from the savepoint.
 * **Archiving**: Savepoints can be archived to be able to reset the state of an application to an earlier point in time.
 
@@ -43,7 +43,7 @@ Flink integrates nicely with many common logging and monitoring services and pro
 
 * **Web UI**: Flink features a web UI to inspect, monitor, and debug running applications. It can also be used to submit executions for execution or cancel them.
 * **Logging**: Flink implements the popular slf4j logging interface and integrates with the logging frameworks [log4j](https://logging.apache.org/log4j/2.x/) or [logback](https://logback.qos.ch/).
-* **Metrics**: Flink features a sophisticated metrics system to collect and report system and user-defined metrics. Metrics can be exported to several reporters, including [JMX](https://en.wikipedia.org/wiki/Java_Management_Extensions), Ganglia, [Graphite](https://graphiteapp.org/), [Prometheus](https://prometheus.io/), [StatsD](https://github.com/etsy/statsd), [Datadog](https://www.datadoghq.com/), and [Slf4j](https://www.slf4j.org/). 
+* **Metrics**: Flink features a sophisticated metrics system to collect and report system and user-defined metrics. Metrics can be exported to several reporters, including [JMX](https://en.wikipedia.org/wiki/Java_Management_Extensions), Ganglia, [Graphite](https://graphiteapp.org/), [Prometheus](https://prometheus.io/), [StatsD](https://github.com/etsy/statsd), [Datadog](https://www.datadoghq.com/), and [Slf4j](https://www.slf4j.org/).
 * **REST API**: Flink exposes a REST API to submit a new application, take a savepoint of a running application, or cancel an application. The REST API also exposes meta data and collected metrics of running or completed applications.
 
 <hr/>
diff --git a/img/flink-home-graphic.png b/img/flink-home-graphic.png
index b502d158c..0b920ac22 100644
Binary files a/img/flink-home-graphic.png and b/img/flink-home-graphic.png differ
diff --git a/usecases.md b/usecases.md
index 16380bde6..0ada86fcf 100644
--- a/usecases.md
+++ b/usecases.md
@@ -4,14 +4,14 @@ title: "Use Cases"
 
 <hr />
 
-Apache Flink is an excellent choice to develop and run many different types of applications due to its extensive features set. Flink's features include support for stream and batch processing, sophisticated state management, event-time processing semantics, and exactly-once consistency guarantees for state. Moreover, Flink can be deployed on various resource providers such as YARN, Apache Mesos, and Kubernetes but also as stand-alone cluster on bare-metal hardware. Configured for high av [...]
+Apache Flink is an excellent choice to develop and run many different types of applications due to its extensive features set. Flink's features include support for stream and batch processing, sophisticated state management, event-time processing semantics, and exactly-once consistency guarantees for state. Moreover, Flink can be deployed on various resource providers such as YARN and Kubernetes, but also as stand-alone cluster on bare-metal hardware. Configured for high availability, Fl [...]
 
 Below, we explore the most common types of applications that are powered by Flink and give pointers to real-world examples.
 
 * <a href="#eventDrivenApps">Event-driven Applications</a>
 * <a href="#analytics">Data Analytics Applications</a>
 * <a href="#pipelines">Data Pipeline Applications</a>
-  
+
 ## Event-driven Applications <a name="eventDrivenApps"></a>
 
 ### What are event-driven applications?
@@ -33,7 +33,7 @@ Instead of querying a remote database, event-driven applications access their da
 
 ### How does Flink support event-driven applications?
 
-The limits of event-driven applications are defined by how well a stream processor can handle time and state. Many of Flink's outstanding features are centered around these concepts. Flink provides a rich set of state primitives that can manage very large data volumes (up to several terabytes) with exactly-once consistency guarantees. Moreover, Flink's support for event-time, highly customizable window logic, and fine-grained control of time as provided by the `ProcessFunction` enable th [...]
+The limits of event-driven applications are defined by how well a stream processor can handle time and state. Many of Flink's outstanding features are centered around these concepts. Flink provides a rich set of state primitives that can manage very large data volumes (up to several terabytes) with exactly-once consistency guarantees. Moreover, Flink's support for event-time, highly customizable window logic, and fine-grained control of time as provided by the `ProcessFunction` enable th [...]
 
 However, Flink's outstanding feature for event-driven applications is savepoint. A savepoint is a consistent state image that can be used as a starting point for compatible applications. Given a savepoint, an application can be updated or adapt its scale, or multiple versions of an application can be started for A/B testing.
 
@@ -41,7 +41,7 @@ However, Flink's outstanding feature for event-driven applications is savepoint.
 
 * <a href="https://sf-2017.flink-forward.org/kb_sessions/streaming-models-how-ing-adds-models-at-runtime-to-catch-fraudsters/">Fraud detection</a>
 * <a href="https://sf-2017.flink-forward.org/kb_sessions/building-a-real-time-anomaly-detection-system-with-flink-mux/">Anomaly detection</a>
-* <a href="https://sf-2017.flink-forward.org/kb_sessions/dynamically-configured-stream-processing-using-flink-kafka/">Rule-based alerting</a> 
+* <a href="https://sf-2017.flink-forward.org/kb_sessions/dynamically-configured-stream-processing-using-flink-kafka/">Rule-based alerting</a>
 * <a href="https://jobs.zalando.com/tech/blog/complex-event-generation-for-business-process-monitoring-using-apache-flink/">Business process monitoring</a>
 * <a href="https://berlin-2017.flink-forward.org/kb_sessions/drivetribes-kappa-architecture-with-apache-flink/">Web application (social network)</a>
 
@@ -61,7 +61,7 @@ Apache Flink supports streaming as well as batch analytical applications as show
 
 ### What are the advantages of streaming analytics applications?
 
-The advantages of continuous streaming analytics compared to batch analytics are not limited to a much lower latency from events to insight due to elimination of periodic import and query execution. In contrast to batch queries, streaming queries do not have to deal with artificial boundaries in the input data which are caused by periodic imports and the bounded nature of the input. 
+The advantages of continuous streaming analytics compared to batch analytics are not limited to a much lower latency from events to insight due to elimination of periodic import and query execution. In contrast to batch queries, streaming queries do not have to deal with artificial boundaries in the input data which are caused by periodic imports and the bounded nature of the input.
 
 Another aspect is a simpler application architecture. A batch analytics pipeline consist of several independent components to periodically schedule data ingestion and query execution. Reliably operating such a pipeline is non-trivial because failures of one component affect the following steps of the pipeline. In contrast, a streaming analytics application which runs on a sophisticated stream processor like Flink incorporates all steps from data ingestions to continuous result computatio [...]
 
@@ -80,7 +80,7 @@ Flink provides very good support for continuous streaming as well as batch analy
 
 ### What are data pipelines?
 
-Extract-transform-load (ETL) is a common approach to convert and move data between storage systems. Often ETL jobs are periodically triggered to copy data from from transactional database systems to an analytical database or a data warehouse. 
+Extract-transform-load (ETL) is a common approach to convert and move data between storage systems. Often ETL jobs are periodically triggered to copy data from from transactional database systems to an analytical database or a data warehouse.
 
 Data pipelines serve a similar purpose as ETL jobs. They transform and enrich data and can move it from one storage system to another. However, they operate in a continuous streaming mode instead of being periodically triggered. Hence, they are able to read records from sources that continuously produce data and move it with low latency to their destination. For example a data pipeline might monitor a file system directory for new files and write their data into an event log. Another app [...]
 
@@ -92,7 +92,7 @@ The figure below depicts the difference between periodic ETL jobs and continuous
 
 ### What are the advantages of data pipelines?
 
-The obvious advantage of continuous data pipelines over periodic ETL jobs is the reduced latency of moving data to its destination. Moreover, data pipelines are more versatile and can be employed for more use cases because they are able to continuously consume and emit data. 
+The obvious advantage of continuous data pipelines over periodic ETL jobs is the reduced latency of moving data to its destination. Moreover, data pipelines are more versatile and can be employed for more use cases because they are able to continuously consume and emit data.
 
 ### How does Flink support data pipelines?
 
@@ -101,5 +101,4 @@ Many common data transformation or enrichment tasks can be addressed by Flink's
 ### What are typical data pipeline applications?
 
 * <a href="https://ververica.com/blog/blink-flink-alibaba-search">Real-time search index building</a> in e-commerce
-* <a href="https://jobs.zalando.com/tech/blog/apache-showdown-flink-vs.-spark/">Continuous ETL</a> in e-commerce 
-
+* <a href="https://jobs.zalando.com/tech/blog/apache-showdown-flink-vs.-spark/">Continuous ETL</a> in e-commerce