You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Piotr Nowojski (Jira)" <ji...@apache.org> on 2019/11/01 10:17:00 UTC

[jira] [Commented] (FLINK-13905) Separate checkpoint triggering into stages

    [ https://issues.apache.org/jira/browse/FLINK-13905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964755#comment-16964755 ] 

Piotr Nowojski commented on FLINK-13905:
----------------------------------------

I'm moving the discussion about this ticket from FLINK-13848, to not clogg it.

[~SleePy]
{quote}
In brief, my solution is introducing a queue of trigger request. If the prior trigger request is not finished, the later one (including checkpoint and savepoint) will be kept in this queue.
{quote}
So the periodic trigger would, if there is an ongoing chain of A->B->C, will just enque a request in this queue, otherwise it would trigger "A". Then we also need a manual logic in A, B and C, that if they fail, we re-check the queue or if "C" completes successfully, it also rechecks the queue?

Isn't it almost the same logic as scheduling the next checkpoint with a delay manually from A, B or C? Without the need for FLINK-13848? Side note, haven't you implemented something similar or exactly this in one of the PRs, in a commit that was ultimately dropped?

In the end, what do you think would be an easier/cleaner/better approach to solve this? 

> Separate checkpoint triggering into stages
> ------------------------------------------
>
>                 Key: FLINK-13905
>                 URL: https://issues.apache.org/jira/browse/FLINK-13905
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Checkpointing
>            Reporter: Biao Liu
>            Assignee: Biao Liu
>            Priority: Major
>             Fix For: 1.10.0
>
>
> Currently {{CheckpointCoordinator#triggerCheckpoint}} includes some heavy IO operations. We plan to separate the triggering into different stages. The IO operations are executed in IO threads, while other on-memory operations are not.
> This is a preparation for making all on-memory operations of {{CheckpointCoordinator}} single threaded (in main thread).
> Note that we could not put on-memory operations of triggering into main thread directly now. Because there are still some operations on a heavy lock (coordinator-wide).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)