You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/05/18 20:29:00 UTC

[jira] [Updated] (FLINK-16986) Enhance the OperatorEvent handling guarantee during checkpointing.

     [ https://issues.apache.org/jira/browse/FLINK-16986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated FLINK-16986:
-----------------------------------
    Labels: pull-request-available  (was: )

> Enhance the OperatorEvent handling guarantee during checkpointing.
> ------------------------------------------------------------------
>
>                 Key: FLINK-16986
>                 URL: https://issues.apache.org/jira/browse/FLINK-16986
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Checkpointing
>            Reporter: Jiangjie Qin
>            Assignee: Jiangjie Qin
>            Priority: Major
>              Labels: pull-request-available
>
> When the {{CheckpointCoordinator}} takes a checkpoint, the checkpointing order is following:
>  # {{CheckpointCoordinator}} triggers checkpoint on each {{OperatorCoordinator}}
>  # Each {{OperatorCoordinator}} takes a snapshot.
>  # Right after taking the snapshot, the {{CheckpointCoordinator}} sends a {{CHECKPOINT_FIN}} marker through the {{OperatorContext}}.
>  # Once the {{OperatorContext}} sees {{CHECKPOINT_FIN}} marker, it will wait for all the previous events are acked and suspend the event gateway to the operators by buffering the future {{OperatorEvents}} sent from the {{OperatorCoordinator}} to the operators without actually sending them out.
>  # The {{CheckpointCoordinator}} waits until all the {{OperatorCoordinator}}s finish step 2-4 and then triggers the task snapshots.
>  # The suspension of an event gateway to an operator can be lifted after all the subtasks of that operator has finished their task checkpoint.
> The mechanism above guarantees all the {{OperatorEvents}} sent before taking the operator coordinator snapshot are handled by the operator before the task snapshots are taken.
> An operator can use this mechanism to know whether an {{OperatorEvent}} it sent to the coordinator is included in the upcoming checkpoint or not. What it has to do is to ask the operator coordinator to ACK that OperatorEvent. If the ACK is received before the operator takes the next snapshot, that OperatorEvent must have been handled and checkpointed by the OperatorCoordinator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)