You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "KevinyhZou (Jira)" <ji...@apache.org> on 2022/07/19 10:21:00 UTC
[jira] [Closed] (FLINK-28604) job failover and not restore from checkpoint in zookeeper HA mode
[ https://issues.apache.org/jira/browse/FLINK-28604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
KevinyhZou closed FLINK-28604.
------------------------------
Fix Version/s: 1.14.5
Resolution: Fixed
> job failover and not restore from checkpoint in zookeeper HA mode
> -----------------------------------------------------------------
>
> Key: FLINK-28604
> URL: https://issues.apache.org/jira/browse/FLINK-28604
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Affects Versions: 1.14.2
> Reporter: KevinyhZou
> Priority: Major
> Fix For: 1.14.5
>
> Attachments: image-2022-07-19-14-30-27-198.png
>
>
> Run a job with flink 1.14.2 by configure the zookeeper ha
> {code:java}
> high-availability.storageDir: hdfs://testcluster/app/ha
> high-availability: zookeeper
> high-availability.zookeeper.quorum: *****
> high-availability.zookeeper.path.root: /flink{code}
> when the zookeeper node restart, I see the JM failover with log "Close and clean up all data for ZookeeperHaServices", So the ha data was cleaned when the first JM shutdown.
> when the second JM was started, the log was "No checkpoint found during restore", and no checkpoint to restored .
> From debug, I find when job failover, it would goto the `ClusterEntryPoint.java` line 285
> !image-2022-07-19-14-30-27-198.png!
> and will set the `cleanupHaData` as true.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)