You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Xintong Song (Jira)" <ji...@apache.org> on 2020/12/12 11:58:00 UTC

[jira] [Updated] (FLINK-20222) The CheckpointCoordinator should reset the OperatorCoordinators when fail before the first checkpoint.

     [ https://issues.apache.org/jira/browse/FLINK-20222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xintong Song updated FLINK-20222:
---------------------------------
    Fix Version/s: 1.11.3

> The CheckpointCoordinator should reset the OperatorCoordinators when fail before the first checkpoint.
> ------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-20222
>                 URL: https://issues.apache.org/jira/browse/FLINK-20222
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>            Reporter: Jiangjie Qin
>            Assignee: Stephan Ewen
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.12.0, 1.11.3
>
>
> Right now, if a job failed before the first successful checkpoint, the CheckpointCoordinator will not reset the OperatorCoordinator state. This may leave the OperatorCoordinators in inconsistent state.
> The CheckpointCoordinator should also reset the OperatorCoordinator state in this case, just like it does for the master hooks. It essentially means "reset to no checkpoint". There are two options for the fix:
>  # Add a reset() method to the OperatorCoordinator.
>  # Call resetToCheckpoint(null) on the OperatorCoordinator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)