You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/04/20 20:06:52 UTC

[GitHub] [flink] alpinegizmo commented on a change in pull request #11826: [FLINK-17236][docs] Add Tutorials section overview

alpinegizmo commented on a change in pull request #11826:
URL: https://github.com/apache/flink/pull/11826#discussion_r411656654



##########
File path: docs/tutorials/index.md
##########
@@ -0,0 +1,186 @@
+---
+title: Hands-on Tutorials
+nav-id: tutorials
+nav-pos: 2
+nav-title: '<i class="fa fa-hand-paper-o title appetizer" aria-hidden="true"></i> Hands-on Tutorials'
+nav-parent_id: root
+nav-show_overview: true
+always-expand: true
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+## Goals and Scope of these Tutorials
+
+These tutorials present an introduction to Apache Flink that includes just enough to get you started
+writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of
+(ultimately important) details. The focus is on providing straightforward introductions to Flink's
+APIs for managing state and time, with the expectation that having mastered these fundamentals,
+you'll be much better equipped to pick up the rest of what you need to know from the more detailed
+reference documentation. The links at the end of each page will lead you to where you can learn
+more.
+
+Specifically, you will learn:
+
+- how to implement streaming data processing pipelines
+- how and why Flink manages state
+- how to use event time to consistently compute accurate analytics
+- how to build event-driven applications on continuous streams
+- how Flink is able to provide fault-tolerant, stateful stream processing with exactly-once semantics
+
+These tutorials focus on four critical concepts: continuous processing of streaming data, event
+time, stateful stream processing, and state snapshots. This page introduces these concepts.
+
+{% info Note %} Accompanying these tutorials are a set of hands-on exercises that will guide you
+through learning how to work with the concepts being presented.
+
+{% top %}
+
+## Stream Processing
+
+Streams are data's natural habitat. Whether it's events from web servers, trades from a stock
+exchange, or sensor readings from a machine on a factory floor, data is created as part of a stream.
+But when you analyze data, you can either organize your processing around _bounded_ or _unbounded_
+streams, and which of these paradigms you choose has profound consequences.
+
+<img src="{{ site.baseurl }}/fig/bounded-unbounded.png" alt="Bounded and unbounded streams" class="offset" width="90%" />
+
+**Batch processing** is the paradigm at work when you process a bounded data stream. In this mode of
+operation you can choose to ingest the entire dataset before producing any results, which means that
+it's possible, for example, to sort the data, compute global statistics, or produce a final report
+that summarizes all of the input.
+
+**Stream processing**, on the other hand, involves unbounded data streams. Conceptually, at least,
+the input may never end, and so you are forced to continuously process the data as it arrives. 
+
+In Flink, applications are composed of **streaming dataflows** that may be transformed by
+user-defined **operators**. These dataflows form directed graphs that start with one or more
+**sources**, and end in one or more **sinks**.
+
+<img src="{{ site.baseurl }}/fig/program_dataflow.svg" alt="A DataStream program, and its dataflow." class="offset" width="80%" />

Review comment:
       I fixed the bucketing sink rendering error in https://github.com/apache/flink/pull/11828#pullrequestreview-396512480 -- this figure is already in the docs, I'm just reusing it. I do agree about keyBy("id") though.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org