You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "KevinyhZou (Jira)" <ji...@apache.org> on 2022/07/19 10:21:00 UTC

[jira] [Closed] (FLINK-28604) job failover and not restore from checkpoint in zookeeper HA mode

     [ https://issues.apache.org/jira/browse/FLINK-28604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

KevinyhZou closed FLINK-28604.
------------------------------
    Fix Version/s: 1.14.5
       Resolution: Fixed

> job failover and not restore from checkpoint in zookeeper HA mode
> -----------------------------------------------------------------
>
>                 Key: FLINK-28604
>                 URL: https://issues.apache.org/jira/browse/FLINK-28604
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.14.2
>            Reporter: KevinyhZou
>            Priority: Major
>             Fix For: 1.14.5
>
>         Attachments: image-2022-07-19-14-30-27-198.png
>
>
> Run a job with flink 1.14.2 by configure the zookeeper ha 
> {code:java}
> high-availability.storageDir: hdfs://testcluster/app/ha
> high-availability: zookeeper
> high-availability.zookeeper.quorum: *****
> high-availability.zookeeper.path.root: /flink{code}
> when the zookeeper node restart, I see the JM failover with log "Close and clean up all data for  ZookeeperHaServices",  So the ha data was cleaned when the first JM shutdown. 
> when the second JM was started,  the log was "No checkpoint found during restore", and no checkpoint to restored  .
> From debug, I find when job failover, it would goto the `ClusterEntryPoint.java` line 285
> !image-2022-07-19-14-30-27-198.png!
> and will set the `cleanupHaData` as true.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)