You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Matthias (Jira)" <ji...@apache.org> on 2020/10/15 07:44:00 UTC

[jira] [Assigned] (FLINK-19544) Implement CheckpointRecoveryFactory based on Kubernetes API

     [ https://issues.apache.org/jira/browse/FLINK-19544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthias reassigned FLINK-19544:
--------------------------------

    Assignee: Yang Wang

> Implement CheckpointRecoveryFactory based on Kubernetes API
> -----------------------------------------------------------
>
>                 Key: FLINK-19544
>                 URL: https://issues.apache.org/jira/browse/FLINK-19544
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Deployment / Kubernetes, Runtime / Checkpointing
>            Reporter: Yang Wang
>            Assignee: Yang Wang
>            Priority: Major
>             Fix For: 1.12.0
>
>
> * *_CheckpointRecoveryFactory_*
>  * Stores meta information to Zookeeper/ConfigMap for checkpoint recovery.
>  * Stores the latest checkpoint counter.
> Each component(Dispatcher, ResourceManager, JobManager, RestEndpoint) will have a dedicated ConfigMap. All the HA information relevant for a specific component will be stored in a single ConfigMap. The JobManager's ConfigMap would then contain the current leader, the pointers to the checkpoints and the checkpoint ID counter. Since “Get(check the leader)-and-Update(write back to the ConfigMap)” is a transactional operation, we will completely solved the concurrent modification issues and not using the "lock-and-release" in Zookeeper.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)