You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/10/15 09:11:27 UTC
[GitHub] [flink-web] AHeise commented on a change in pull request #387: Add blog post: From Aligned to Unaligned Checkpoints - Part 1: Checkpoints, Alignment, and Backpressure

AHeise commented on a change in pull request #387:
URL: https://github.com/apache/flink-web/pull/387#discussion_r505381910



##########
File path: _posts/2020-10-13-from-aligned-to-unaligned-checkpoints-part-1.md
##########
@@ -0,0 +1,117 @@
+---
+layout: post 
+title: "From Aligned to Unaligned Checkpoints - Part 1: Checkpoints, Alignment, and Backpressure" 
+date: 2020-10-13T03:00:00.000Z
+authors:
+- Arvid Heise:
+  name: "Arvid Heise"
+- Stephan Ewen:
+  name: "Stephan Ewen"
+excerpt: Apache Flink’s checkpoint-based fault tolerance mechanism is one of its defining features. Because of that design, Flink unifies batch and stream processing, can easily scale to both very small and extremely large scenarios and provides support for many operational features. In this post we recap the original checkpointing process in Flink, its core properties and issues under backpressure.
+---
+
+Apache Flink’s checkpoint-based fault tolerance mechanism is one of its defining features. Because of that design, Flink unifies batch and stream processing, can easily scale to both [very small](https://hal.inria.fr/hal-02463206/document) and [extremely large](https://102.alibaba.com/detail?id=35) scenarios and provides support for many operational features like stateful upgrades with [state evolution](https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/schema_evolution.html) or [roll-backs and time-travel](https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html). 
+
+Despite all these great properties, Flink's checkpointing method has an Achilles Heel: the speed of a completed checkpoint is determined by the speed at which data flows through the application. When the application backpressures, the processing of checkpoints is backpressured as well (Appendix 1 recaps what is backpressure and why it can be a good thing). In such cases, checkpoints may take longer to complete or even time out completely.
+
+In Flink 1.11, the community introduced a first version of a new feature called "[unaligned checkpoints](https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/checkpoints.html#unaligned-checkpoints)" that aims at solving this issue, while Flink 1.12 plans to further expand its functionality. In this two-series blog post, we discuss how Flink’s checkpointing mechanism has been modified to support unaligned checkpoints, how unaligned checkpoints work, and how this new mode impacts Flink users. In the first of the two posts, we start with a recap of the original checkpointing process in Flink, its core properties and issues under backpressure.
+
+
+## State in Streaming Applications
+
+Simply put, State is the information that you need to remember across events. Even the most trivial streaming applications are typically stateful because of their need to “remember” the exact position they are processing data from, in the form of a Kafka Partition Offset or a File Offset.

Review comment:
       Yes, good point. From academics perspective "or" is always inclusive, but on a more informal article, I guess it makes sense to more follow the verbal "or", which has a high chance of being exclusive.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org