You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Yuan Mei (Jira)" <ji...@apache.org> on 2020/06/04 04:05:00 UTC
[jira] [Created] (FLINK-18112) Single Task Failure Recovery
Prototype
Yuan Mei created FLINK-18112:
--------------------------------
Summary: Single Task Failure Recovery Prototype
Key: FLINK-18112
URL: https://issues.apache.org/jira/browse/FLINK-18112
Project: Flink
Issue Type: New Feature
Components: Runtime / Checkpointing, Runtime / Coordination, Runtime / Network
Affects Versions: 1.12.0
Environment: Build a prototype of single task failure recovery to address and answer the following questions:
Step 1: Scheduling part, restart a single node without restarting the upstream or downstream nodes.
Step 2: Checkpointing part, as my understanding of how regional failover works, this part might not need modification.
Step 3: Network part
- how the recovered node able to link to the upstream ResultPartitions, and continue getting data
- how the downstream node able to link to the recovered node, and continue getting node
- how different netty transit mode affects the results
- what if the failed node buffered data pool is full
Step 4: Failover process verification
Reporter: Yuan Mei
Fix For: 1.12.0
--
This message was sent by Atlassian Jira
(v8.3.4#803005)