You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Jiangjie Qin (Jira)" <ji...@apache.org> on 2022/02/09 05:31:00 UTC

[jira] [Created] (FLINK-26029) Generalize the checkpoint protocol of OperatorCoordinator.

Jiangjie Qin created FLINK-26029:
------------------------------------

             Summary: Generalize the checkpoint protocol of OperatorCoordinator.
                 Key: FLINK-26029
                 URL: https://issues.apache.org/jira/browse/FLINK-26029
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Checkpointing
    Affects Versions: 1.14.3
            Reporter: Jiangjie Qin


Currently the JM opens all the event valves from the OperatorCoordinator to the subtasks after the checkpoint barriers are sent to the Source subtasks. While this works for the Source Operators, it unnecessarily limits general usage of the OperatorCoordinator for other operators.

To generalize the protocol, we can change the JM to open the event valve of the subtasks that have finished the local checkpoint. So the protocol would become following:
 # Let the OC finish processing all the incoming OperatorEvents before the snapshot.
 # Wait until all the outgoing OperatorEvents before the snapshot are sent and acked.
 # Shut the event valve so no outgoing events can be sent to the subtasks.
 # Send checkpoint barriers to the Source operators.
 # Open the corresponding event valve of a subtask when the AcknowledgeCheckpoint messages from that subtask is received. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)