You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by fh...@apache.org on 2018/06/21 15:29:09 UTC

[10/12] flink-web git commit: [FLINK-9522] Restructure Flink website

http://git-wip-us.apache.org/repos/asf/flink-web/blob/cbefc2e9/img/function-state.png
----------------------------------------------------------------------
diff --git a/img/function-state.png b/img/function-state.png
new file mode 100644
index 0000000..7b21b29
Binary files /dev/null and b/img/function-state.png differ

http://git-wip-us.apache.org/repos/asf/flink-web/blob/cbefc2e9/img/local-state.png
----------------------------------------------------------------------
diff --git a/img/local-state.png b/img/local-state.png
new file mode 100644
index 0000000..a888484
Binary files /dev/null and b/img/local-state.png differ

http://git-wip-us.apache.org/repos/asf/flink-web/blob/cbefc2e9/img/usecases-analytics.png
----------------------------------------------------------------------
diff --git a/img/usecases-analytics.png b/img/usecases-analytics.png
new file mode 100644
index 0000000..50882c2
Binary files /dev/null and b/img/usecases-analytics.png differ

http://git-wip-us.apache.org/repos/asf/flink-web/blob/cbefc2e9/img/usecases-datapipelines.png
----------------------------------------------------------------------
diff --git a/img/usecases-datapipelines.png b/img/usecases-datapipelines.png
new file mode 100644
index 0000000..252df33
Binary files /dev/null and b/img/usecases-datapipelines.png differ

http://git-wip-us.apache.org/repos/asf/flink-web/blob/cbefc2e9/img/usecases-eventdrivenapps.png
----------------------------------------------------------------------
diff --git a/img/usecases-eventdrivenapps.png b/img/usecases-eventdrivenapps.png
new file mode 100644
index 0000000..8b1d43a
Binary files /dev/null and b/img/usecases-eventdrivenapps.png differ

http://git-wip-us.apache.org/repos/asf/flink-web/blob/cbefc2e9/index.md
----------------------------------------------------------------------
diff --git a/index.md b/index.md
index 9b55638..dc8d35b 100755
--- a/index.md
+++ b/index.md
@@ -1,12 +1,13 @@
 ---
-title: "Scalable Stream and Batch Data Processing"
+title: "Stateful Computations over Data Streams"
 layout: base
 ---
 <div class="row-fluid">
 
-  <div class="col-sm-10 col-sm-offset-1 homecontent">
-    <p class="lead" markdown="span">Apache Flink® is an open-source stream processing framework for **distributed, high-performing, always-available,** and **accurate** data streaming applications.</p>
-    <a href="{{ site.baseurl }}/introduction.html" class="btn btn-default btn-intro">Introduction to Flink</a>
+  <div class="col-sm-12">
+    <p class="lead" markdown="span">
+      **Apache Flink<sup>®</sup> - Stateful Computations over Data Streams**
+    </p>
   </div>
 
 <div class="col-sm-12">
@@ -15,48 +16,125 @@ layout: base
 
 </div>
 
-
+<!-- High-level architecture figure -->
 
 <div class="row front-graphic">
-  <img src="{{ site.baseurl }}/img/flink-home-graphic-update.svg" width="700px" />
-</div>
-
-<!-- Updates section -->
-
-<div class="row-fluid">
-
-<div class="col-sm-12">
   <hr />
+  <img src="{{ site.baseurl }}/img/flink-home-graphic.png" width="800px" />
 </div>
 
-<div class="col-sm-3">
-
-  <h2>Latest Blog Posts</h2>
+<!-- Feature grid -->
 
+<!--
+<div class="row">
+  <div class="col-sm-12">
+    <hr />
+    <h2><a href="{{ site.baseurl }}/features.html">Features</a></h2>
+  </div>
 </div>
-
-<div class="col-sm-9">
-
-  <dl>
-    {% for post in site.posts limit:5 %}  
-        <dt> <a href="{{ site.baseurl }}{{ post.url }}">{{ post.title }}</a></dt>
-        <dd>{{ post.excerpt }}</dd>
-    {% endfor %}
-  </dl>
-
+-->
+<div class="row">
+  <div class="col-sm-4">
+    <div class="panel panel-default">
+      <div class="panel-heading">
+        <span class="glyphicon glyphicon-th"></span> <b>All streaming use cases</b>
+      </div>
+      <div class="panel-body">
+        <ul style="font-size: small;">
+          <li>Event-driven Applications</li>
+          <li>Stream &amp; Batch Analytics</li>
+          <li>Data Pipelines &amp; ETL</li>
+        </ul>
+        <a href="{{ site.baseurl }}/usecases.html">Learn more</a>
+      </div>
+    </div>
+  </div>
+  <div class="col-sm-4">
+    <div class="panel panel-default">
+      <div class="panel-heading">
+        <span class="glyphicon glyphicon-ok"></span> <b>Guaranteed correctness</b>
+      </div>
+      <div class="panel-body">
+        <ul style="font-size: small;">
+          <li>Exactly-once state consistency</li>
+          <li>Event-time processing</li>
+          <li>Sophisticated late data handling</li>
+        </ul>
+        <a href="{{ site.baseurl }}/flink-applications.html#building-blocks-for-streaming-applications">Learn more</a>
+      </div>
+    </div>
+  </div>
+  <div class="col-sm-4">
+    <div class="panel panel-default">
+      <div class="panel-heading">
+        <span class="glyphicon glyphicon glyphicon-sort-by-attributes"></span> <b>Layered APIs</b>
+      </div>
+      <div class="panel-body">
+        <ul style="font-size: small;">
+          <li>SQL on Stream &amp; Batch Data</li>
+          <li>DataStream API &amp; DataSet API</li>
+          <li>ProcessFunction (Time &amp; State)</li>
+        </ul>
+        <a href="{{ site.baseurl }}/flink-applications.html#layered-apis">Learn more</a>
+      </div>
+    </div>
+  </div>
+</div>
+<div class="row">
+  <div class="col-sm-4">
+    <div class="panel panel-default">
+      <div class="panel-heading">
+        <span class="glyphicon glyphicon-dashboard"></span> <b>Operational Focus</b>
+      </div>
+      <div class="panel-body">
+        <ul style="font-size: small;">
+          <li>Flexible deployment</li>
+          <li>High-availability setup</li>
+          <li>Savepoints</li>
+        </ul>
+        <a href="{{ site.baseurl }}/flink-operations.html">Learn more</a>
+      </div>
+    </div>
+  </div>
+  <div class="col-sm-4">
+    <div class="panel panel-default">
+      <div class="panel-heading">
+        <span class="glyphicon glyphicon-fullscreen"></span> <b>Scales to any use case</b>
+      </div>
+      <div class="panel-body">
+        <ul style="font-size: small;">
+          <li>Scale-out architecture</li>
+          <li>Support for very large state</li>
+          <li>Incremental checkpointing</li>
+        </ul>
+        <a href="{{ site.baseurl }}/flink-architecture.html#run-applications-at-any-scale">Learn more</a>
+      </div>
+    </div>
+  </div>
+  <div class="col-sm-4">
+    <div class="panel panel-default">
+      <div class="panel-heading">
+        <span class="glyphicon glyphicon-flash"></span> <b>Excellent Performance</b>
+      </div>
+      <div class="panel-body">
+        <ul style="font-size: small;">
+          <li>Low latency</li>
+          <li>High throughput</li>
+          <li>In-Memory computing</li>
+        </ul>
+        <a href="{{ site.baseurl }}/flink-architecture.html#leverage-in-memory-performance">Learn more</a>
+      </div>
+    </div>
+  </div>
 </div>
 
 <!-- Powered by section -->
 
-<div class="row-fluid">
+<div class="row">
   <div class="col-sm-12">
-
-
-  <hr />
+    <br />
     <h2><a href="{{ site.baseurl }}/poweredby.html">Powered by Flink</a></h2>
 
-
-
   <div class="jcarousel">
     <ul>
         <li>
@@ -125,6 +203,33 @@ layout: base
 
 </div>
 
+<!-- Updates section -->
+
+<div class="row">
+
+<div class="col-sm-12">
+  <hr />
+</div>
+
+<div class="col-sm-3">
+
+  <h2><a href="{{ site.baseurl }}/blog.html">Latest Blog Posts</a></h2>
+
+</div>
+
+<div class="col-sm-9">
+
+  <dl>
+    {% for post in site.posts limit:5 %}  
+        <dt> <a href="{{ site.baseurl }}{{ post.url }}">{{ post.title }}</a></dt>
+        <dd>{{ post.excerpt }}</dd>
+    {% endfor %}
+  </dl>
+
+</div>
+
+<!-- Scripts section -->
+
 <script type="text/javascript" src="{{ site.baseurl }}/js/jquery.jcarousel.min.js"></script>
 
 <script type="text/javascript">

http://git-wip-us.apache.org/repos/asf/flink-web/blob/cbefc2e9/poweredby.md
----------------------------------------------------------------------
diff --git a/poweredby.md b/poweredby.md
index 01ec913..10a5ad3 100755
--- a/poweredby.md
+++ b/poweredby.md
@@ -6,25 +6,19 @@ title: "Powered by Flink"
 <!--                Powered by Flink
 <!-- --------------------------------------------- -->
 
-----
-<head>
-<style>
-   th, td {
-   padding: 10px;
-   }
-   .height > img {min-height: 137px};
-</style>
-</head>
+<hr />
 
-<p>To demonstrate Flink's capabilities, we've collected a few examples of Flink use cases inside of companies. The <a href="https://cwiki.apache.org/confluence/display/FLINK/Powered+by+Flink" target='_blank'><small><span class="glyphicon glyphicon-new-window"></span></small> Powered by Flink directory</a> has a comprehensive list of companies and organizations using Flink.</p>
+Apache Flink powers business-critical applications in many companies and enterprises around the globe. On this page, we present a few notable Flink users that run interesting use cases in production and link to resources that discuss their applications in more detail.
 
-Would you like to be included on this page? Please reach out to the [Flink user mailing list]({{ site.baseurl }}/community.html#mailing-lists) and let us know.
+More Flink users are listed in the <a href="https://cwiki.apache.org/confluence/display/FLINK/Powered+by+Flink" target='_blank'><small><span class="glyphicon glyphicon-new-window"></span></small> Powered by Flink directory</a> in the project wiki. Please note that the list is *not* comprehensive. We only add users that explicitly ask to be listed.
+
+If you would you like to be included on this page, please reach out to the [Flink user mailing list]({{ site.baseurl }}/community.html#mailing-lists) and let us know.
 
 <div class="row-fluid">
 
    <div class="height col-md-3 col-sm-4 col-xs-6">
       <img src="{{ site.baseurl }}/img/poweredby/alibaba-logo.png" width="175"  alt="Alibaba" /><br />
-      Alibaba, the world's largest retailer, uses a fork of Flink called Blink to optimize search rankings in real time. <br><br><a href="http://data-artisans.com/blink-flink-alibaba-search/" target='_blank'><small><span class="glyphicon glyphicon-new-window"></span></small> Read more about Flink's role at Alibaba</a>
+      Alibaba, the world's largest retailer, uses a fork of Flink called Blink to optimize search rankings in real time. <br><br><a href="https://data-artisans.com/blog/blink-flink-alibaba-search" target='_blank'><small><span class="glyphicon glyphicon-new-window"></span></small> Read more about Flink's role at Alibaba</a>
    </div>
    <div class="height col-md-3 col-sm-4 col-xs-6">
       <img src="{{ site.baseurl }}/img/poweredby/bettercloud-logo.png" width="175"  alt="BetterCloud" /><br />
@@ -36,7 +30,7 @@ Would you like to be included on this page? Please reach out to the [Flink user
    </div>
    <div class="height col-md-3 col-sm-4 col-xs-6">
       <img src="{{ site.baseurl }}/img/poweredby/capital-one-logo.png" width="175"  alt="Capital One" /><br />
-      Capital One, a Fortune 500 financial services company, uses Flink for real-time activity monitoring and alerting. <br><br><a href="http://www.slideshare.net/FlinkForward/flink-case-study-capital-one" target='_blank'><small><span class="glyphicon glyphicon-new-window"></span></small> See Capital One's case study slides</a>
+      Capital One, a Fortune 500 financial services company, uses Flink for real-time activity monitoring and alerting. <br><br><a href="https://www.slideshare.net/FlinkForward/flink-forward-san-francisco-2018-andrew-gao-jeff-sharpe-finding-bad-acorns" target='_blank'><small><span class="glyphicon glyphicon-new-window"></span></small> Learn about Capital One's fraud detection use case</a>
    </div>
    <div class="height col-md-3 col-sm-4 col-xs-6">
       <img src="{{ site.baseurl }}/img/poweredby/dtrb-logo.png" width="175"  alt="Drivetribe" /><br />
@@ -82,7 +76,7 @@ Would you like to be included on this page? Please reach out to the [Flink user
    </div>
    <div class="height col-md-3 col-sm-4 col-xs-6">
    <img src="{{ site.baseurl }}/img/poweredby/zalando-logo.jpg" width="175" alt="Zalando" /><br />
-         Zalando, one of the largest e-commerce companies in Europe, uses Flink for real-time process monitoring and ETL. <br><br><a href="https://tech.zalando.de/blog/apache-showdown-flink-vs.-spark/" target='_blank'><small><span class="glyphicon glyphicon-new-window"></span></small> Read more on the Zalando Tech Blog</a>
+         Zalando, one of the largest e-commerce companies in Europe, uses Flink for real-time process monitoring and ETL. <br><br><a href="https://jobs.zalando.com/tech/blog/complex-event-generation-for-business-process-monitoring-using-apache-flink" target='_blank'><small><span class="glyphicon glyphicon-new-window"></span></small> Read more on the Zalando Tech Blog</a>
 
 
 </div>

http://git-wip-us.apache.org/repos/asf/flink-web/blob/cbefc2e9/usecases.md
----------------------------------------------------------------------
diff --git a/usecases.md b/usecases.md
index cfb5779..fc41c7d 100644
--- a/usecases.md
+++ b/usecases.md
@@ -1,28 +1,105 @@
 ---
-title: "Flink Use Cases"
+title: "Use Cases"
 ---
 
-To demonstrate how Flink can be applied to unbounded datasets, here’s a selection of real-word Flink users and problems they’re solving with Flink.
+<hr />
 
-For more examples, please see the [Powered by Flink]({{ site.baseurl }}/poweredby.html) page.
+Apache Flink is an excellent choice to develop and run many different types of applications due to its extensive features set. Flink's features include support for stream and batch processing, sophisticated state management, event-time processing semantics, and exactly-once consistency guarantees for state. Moreover, Flink can be deployed on various resource providers such as YARN, Apache Mesos, and Kubernetes but also as stand-alone cluster on bare-metal hardware. Configured for high availability, Flink does not have a single point of failure. Flink has been proven to scale to thousands of cores and terabytes of application state, delivers high throughput and low latency, and powers some of the world's most demanding stream processing applications.
 
-+ **Optimization of e-commerce search results in real-time:** Alibaba’s search infrastructure team uses Flink to update product detail and inventory information in real-time, improving relevance for users.
+Below, we explore the most common types of applications that are powered by Flink and give pointers to real-world examples.
 
-+ **Stream processing-as-a-service for data science teams:** King (the creators of Candy Crush Saga) makes real-time analytics available to its data scientists via a Flink-powered internal platform, dramatically shortening the time to insights from game data.
+* <a href="#eventDrivenApps">Event-driven Applications</a>
+* <a href="#analytics">Data Analytics Applications</a>
+* <a href="#pipelines">Data Pipeline Applications</a>
+  
+## Event-driven Applications <a name="eventDrivenApps"></a>
 
-+ **Network / sensor monitoring and error detection:** Bouygues Telecom, one of the largest telecom providers in France, uses Flink to monitor its wired and wireless networks, enabling a rapid response to outages throughout the country.
+### What are event-driven applications?
 
-+ **ETL for business intelligence infrastructure:** Zalando uses Flink to transform data for easier loading into its data warehouse, converting complex payloads into relatively simple ones and ensuring that analytics end users have faster access to data.
+An event-driven application is a stateful application that ingest events from one or more event streams and reacts to incoming events by triggering computations, state updates, or external actions.
 
+Event-driven applications are an evolution of the traditional application design with separated compute and data storage tiers. In this architecture, applications read data from and persist data to a remote transactional database.
 
-We can tease out common threads from these use cases. Based on the examples above, Flink is well-suited for:
+In contrast, event-driven applications are based on stateful stream processing applications. In this design, data and computation are co-located, which yields local (in-memory or disk) data access. Fault-tolerance is achieved by periodically writing checkpoints to a remote persistent storage. The figure below depicts the difference between the traditional application architecture and event-driven applications.
 
-1. **A variety of (sometimes unreliable) data sources:** When data is generated by millions of different users or devices, it’s safe to assume that some events will arrive out of the order they actually occurred--and in the case of more significant upstream failures, some events might come _hours_ later than they’re supposed to. Late data needs to be handled so that results are accurate.
+<br>
+<div class="row front-graphic">
+  <img src="{{ site.baseurl }}/img/usecases-eventdrivenapps.png" width="700px" />
+</div>
 
-2. **Applications with state:** When applications become more complex than simple filtering or enhancing of single data records, managing state within these applications (e.g., counters, windows of past data, state machines, embedded databases) becomes hard. Flink provides tools so that state is efficient, fault-tolerant, and manageable from the outside so you don’t have to build these capabilities yourself.
+### What are the advantages of event-driven applications?
 
-3. **Data that is processed quickly:** There is a focus in these use cases on real-time or near-real-time scenarios, where insights from data should be available at nearly the same moment that the data is generated. Flink is fully capable of meeting these latency requirements when necessary.
+Instead of querying a remote database, event-driven applications access their data locally which yields better performance, both in terms of throughput and latency. The periodic checkpoints to a remote persistent storage can be asynchronously and incrementally done. Hence, the impact of checkpointing on the regular event processing is very small. However, the event-driven application design provides more benefits than just local data access. In the tiered architecture, it is common that multiple applications share the same database. Hence, any change of the database, such as changing the data layout due to an application update or scaling the service, needs to be coordinated. Since each event-driven application is responsible for its own data, changes to the data representation or scaling the application requires less coordination.
 
-4. **Data in large volumes:** These programs would need to be distributed across many nodes (in some cases, thousands) to support the required scale. Flink can run on large clusters just as seamlessly as it runs on small ones.
+### How does Flink support event-driven applications?
+
+The limits of event-driven applications are defined by how well a stream processor can handle time and state. Many of Flink's outstanding features are centered around these concepts. Flink provides a rich set of state primitives that can manage very large data volumes (up to several terabytes) with exactly-once consistency guarantees. Moreover, Flink's support for event-time, highly customizable window logic, and fine-grained control of time as provided by the `ProcessFunction` enable the implementation of advanced business logic. Moreover, Flink features a library for Complex Event Processing (CEP) to detect patterns in data streams. 
+
+However, Flink's outstanding feature for event-driven applications are savepoints. A savepoint a consistent state image that can be used as a starting point for compatible applications. Given a savepoint, an application can be updated or adapt its scale, or multiple versions of an application can be started for A/B testing.
+
+### What are typical event-driven applications?
+
+* <a href="https://sf-2017.flink-forward.org/kb_sessions/streaming-models-how-ing-adds-models-at-runtime-to-catch-fraudsters/">Fraud detection</a>
+* <a href="https://sf-2017.flink-forward.org/kb_sessions/building-a-real-time-anomaly-detection-system-with-flink-mux/">Anomaly detection</a>
+* <a href="https://sf-2017.flink-forward.org/kb_sessions/dynamically-configured-stream-processing-using-flink-kafka/">Rule-based alerting</a> 
+* <a href="https://jobs.zalando.com/tech/blog/complex-event-generation-for-business-process-monitoring-using-apache-flink/">Business process monitoring</a>
+* <a href="https://berlin-2017.flink-forward.org/kb_sessions/drivetribes-kappa-architecture-with-apache-flink/">Web application (social network)</a>
+
+## Data Analytics Applications<a name="analytics"></a>
+
+### What are data analytics applications?
+
+Analytical jobs extract information and insight from raw data. Traditionally, analytics are performed as batch queries or applications on bounded data sets of recorded events. In order to incorporate the latest data into the result of the analysis, it has to be added to the analyzed data set and the query or application is rerun. The results are written to a storage system or emitted as reports.
+
+With a sophisticated stream processing engine, analytics can also be performed in a real-time fashion. Instead of reading finite data sets, streaming queries or applications ingest real-time event streams and continuously produce and update results as events are consumed. The results are either written to an external database or maintained as internal state. Dashboard application can read the latest results from the external database or directly query the internal state of the application.
+
+Apache Flink supports streaming as well as batch analytical applications as shown in the figure below.
+
+<div class="row front-graphic">
+  <img src="{{ site.baseurl }}/img/usecases-analytics.png" width="700px" />
+</div>
+
+### What are the advantages of streaming analytics applications?
+
+The advantages of continuous streaming analytics compared to batch analytics are not limited to a much lower latency from events to insight due to elimination of periodic import and query execution. In contrast to batch queries, streaming queries do not have to deal with artificial boundaries in the input data which are caused by periodic imports and the bounded nature of the input. 
+
+Another aspect is a simpler application architecture. A batch analytics pipeline consist of several independent components to periodically schedule data ingestion and query execution. Reliably operating such a pipeline is non-trivial because failures of one component affect the following steps of the pipeline. In contrast, a streaming analytics application which runs on a sophisticated stream processor like Flink incorporates all steps from data ingestions to continuous result computation. Therefore, it can rely on the engine's failure recovery mechanism.
+
+### How does Flink support data analytics applications?
+
+Flink provides very good support for continuous streaming as well as batch analytics. Specifically, it features an ANSI-compliant SQL interface with unified semantics for batch and streaming queries. SQL queries compute the same result regardless whether they are run on a static data set of recorded events or on a real-time event stream. Rich support for user-defined functions ensures that custom code can be executed in SQL queries. If even more custom logic is required, Flink's DataStream API or DataSet API provide more low-level control. Moreover, Flink's Gelly library provides algorithms and building blocks for large-scale and high-performance graph analytics on batch data sets.
+
+### What are typical data analytics applications?
+
+* <a href="http://2016.flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/">Quality monitoring of Telco networks</a>
+* <a href="https://techblog.king.com/rbea-scalable-real-time-analytics-king/">Analysis of product updates &amp; experiment evaluation</a> in mobile applications
+* <a href="https://eng.uber.com/athenax/">Ad-hoc analysis of live data</a> in consumer technology
+* Large-scale graph analysis
+
+## Data Pipeline Applications <a name="pipelines"></a>
+
+### What are data pipelines?
+
+Extract-transform-load (ETL) is a common approach to convert and move data between storage systems. Often ETL jobs are periodically triggered to copy data from from transactional database systems to an analytical database or a data warehouse. 
+
+Data pipelines serve a similar purpose as ETL jobs. They transform and enrich data and can move it from one storage system to another. However, they operate in a continuous streaming mode instead of being periodically triggered. Hence, they are able to read records from sources that continuously produce data and move it with low latency to their destination. For example a data pipeline might monitor a file system directory for new files and write their data into an event log. Another application might materialize an event stream to a database or incrementally build and refine a search index.
+
+The figure below depicts the difference between periodic ETL jobs and continuous data pipelines.
+
+<div class="row front-graphic">
+  <img src="{{ site.baseurl }}/img/usecases-datapipelines.png" width="700px" />
+</div>
+
+### What are the advantages of data pipelines?
+
+The obvious advantage of continuous data pipelines over periodic ETL jobs is the reduced latency of moving data to its destination. Moreover, data pipelines are more versatile and can be employed for more use cases because they are able to continuously consume and emit data. 
+
+### How does Flink support data pipelines?
+
+Many common data transformation or enrichment tasks can be addressed by Flink's SQL interface (or Table API) and its support for user-defined functions. Data pipelines with more advanced requirements can be realized by using the DataStream API which is more generic. Flink provides a rich set of connectors to various storage systems such as Kafka, Kinesis, Elasticsearch, and JDBC database systems. It also features continuous sources for file systems that monitor directories and sinks that write files in a time-bucketed fashion.
+
+### What are typical data pipeline applications?
+
+* <a href="https://data-artisans.com/blog/blink-flink-alibaba-search">Real-time search index building</a> in e-commerce
+* <a href="https://jobs.zalando.com/tech/blog/apache-showdown-flink-vs.-spark/">Continuous ETL</a> in e-commerce 
 
-And for more user stories, we recommend the sessions from <a href="http://flink-forward.org/program/sessions/" target="_blank">Flink Forward 2016</a>, the annual Flink user conference.