You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "WangMinChao (Jira)" <ji...@apache.org> on 2022/03/29 08:21:00 UTC

[jira] [Commented] (FLINK-26908) HA job cannot to restarting

    [ https://issues.apache.org/jira/browse/FLINK-26908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513913#comment-17513913 ] 

WangMinChao commented on FLINK-26908:
-------------------------------------

By my deep dig, i found out the org.apache.flink.runtime.dispatcher.Dispatcher#jobReachedTerminalState

method return value is CleanupJobState.GLOBAL, it will cause zookeeper HA data been cleanup.

> HA job cannot to restarting
> ---------------------------
>
>                 Key: FLINK-26908
>                 URL: https://issues.apache.org/jira/browse/FLINK-26908
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.13.3
>            Reporter: WangMinChao
>            Priority: Major
>         Attachments: jm.log
>
>
> We have running a job about the flinkcdc wrtiing to starrocks.
> At the first failure, this job can been restarting,and  successful create archived file .
> {code:java}
> 2022-03-20 18:41:15,812 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job mysql_2_sr_sr_cluster_1_qqm (971eb686ebd6af2f45f77ba97575443c) switched from state RESTARTING to SUSPENDED.
> org.apache.flink.util.FlinkException: Scheduler is being stopped. ...
> ...
> 2022-03-20 18:41:16,139 INFO org.apache.flink.runtime.history.FsJobArchivist [] - Job 971eb686ebd6af2f45f77ba97575443c has been archived at cosn://bg-rt-flink-prod-1254213275/flink/completed-jobs/971eb686ebd6af2f45f77ba97575443c. 
> ...
> 2022-03-20 18:41:15,843 INFO  org.apache.flink.runtime.dispatcher.runner.JobDispatcherLeaderProcess [] - Start JobDispatcherLeaderProcess.  {code}
>  
> On a subsequent failure,this job cannot to restarting,and not successful create archived file 
> {code:java}
> 2022-03-22 16:18:44,991 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job mysql_2_sr_sr_cluster_1_qqm (971eb686ebd6af2f45f77ba97575443c) switched from state RUNNING to SUSPENDED.org.apache.flink.util.FlinkException: Scheduler is being stopped.
> ...
> 2022-03-22 16:19:00,080 ERROR org.apache.flink.runtime.history.FsJobArchivist              [] - Failed to archive job.org.apache.hadoop.fs.FileAlreadyExistsException: File already exists: cosn://bg-rt-flink-prod-1254213275/flink/completed-jobs/971eb686ebd6af2f45f77ba97575443c 
> ...
> 2022-03-22 16:19:00,919 INFO  org.apache.flink.runtime.dispatcher.runner.JobDispatcherLeaderProcess [] - Stopping JobDispatcherLeaderProcess.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)