You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Till Rohrmann (Jira)" <ji...@apache.org> on 2019/12/10 08:12:00 UTC

[jira] [Updated] (FLINK-13698) Rework threading model of CheckpointCoordinator

     [ https://issues.apache.org/jira/browse/FLINK-13698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Till Rohrmann updated FLINK-13698:
----------------------------------
    Fix Version/s: 1.10.0

> Rework threading model of CheckpointCoordinator
> -----------------------------------------------
>
>                 Key: FLINK-13698
>                 URL: https://issues.apache.org/jira/browse/FLINK-13698
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.10.0
>            Reporter: Piotr Nowojski
>            Assignee: Biao Liu
>            Priority: Critical
>             Fix For: 1.10.0
>
>
> Currently {{CheckpointCoordinator}} and {{CheckpointFailureManager}} code is executed by multiple different threads (mostly {{ioExecutor}}, but not only). It's causing multiple concurrency issues, for example: https://issues.apache.org/jira/browse/FLINK-13497
> Proper fix would be to rethink threading model there. At first glance it doesn't seem that this code should be multi threaded, except of parts doing the actual IO operations, so it should be possible to run everything in one single ExecutionGraph's thread and just run asynchronously necessary IO operations with some feedback loop ("mailbox style").
> I would strongly recommend fixing this issue before adding new features in the \{{CheckpointCoordinator}} component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)