You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "WangMinChao (Jira)" <ji...@apache.org> on 2022/03/29 08:21:00 UTC
[jira] [Commented] (FLINK-26908) HA job cannot to restarting
[ https://issues.apache.org/jira/browse/FLINK-26908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513913#comment-17513913 ]
WangMinChao commented on FLINK-26908:
-------------------------------------
By my deep dig, i found out the org.apache.flink.runtime.dispatcher.Dispatcher#jobReachedTerminalState
method return value is CleanupJobState.GLOBAL, it will cause zookeeper HA data been cleanup.
> HA job cannot to restarting
> ---------------------------
>
> Key: FLINK-26908
> URL: https://issues.apache.org/jira/browse/FLINK-26908
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.13.3
> Reporter: WangMinChao
> Priority: Major
> Attachments: jm.log
>
>
> We have running a job about the flinkcdc wrtiing to starrocks.
> At the first failure, this job can been restarting,and successful create archived file .
> {code:java}
> 2022-03-20 18:41:15,812 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job mysql_2_sr_sr_cluster_1_qqm (971eb686ebd6af2f45f77ba97575443c) switched from state RESTARTING to SUSPENDED.
> org.apache.flink.util.FlinkException: Scheduler is being stopped. ...
> ...
> 2022-03-20 18:41:16,139 INFO org.apache.flink.runtime.history.FsJobArchivist [] - Job 971eb686ebd6af2f45f77ba97575443c has been archived at cosn://bg-rt-flink-prod-1254213275/flink/completed-jobs/971eb686ebd6af2f45f77ba97575443c.
> ...
> 2022-03-20 18:41:15,843 INFO org.apache.flink.runtime.dispatcher.runner.JobDispatcherLeaderProcess [] - Start JobDispatcherLeaderProcess. {code}
>
> On a subsequent failure,this job cannot to restarting,and not successful create archived file
> {code:java}
> 2022-03-22 16:18:44,991 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job mysql_2_sr_sr_cluster_1_qqm (971eb686ebd6af2f45f77ba97575443c) switched from state RUNNING to SUSPENDED.org.apache.flink.util.FlinkException: Scheduler is being stopped.
> ...
> 2022-03-22 16:19:00,080 ERROR org.apache.flink.runtime.history.FsJobArchivist [] - Failed to archive job.org.apache.hadoop.fs.FileAlreadyExistsException: File already exists: cosn://bg-rt-flink-prod-1254213275/flink/completed-jobs/971eb686ebd6af2f45f77ba97575443c
> ...
> 2022-03-22 16:19:00,919 INFO org.apache.flink.runtime.dispatcher.runner.JobDispatcherLeaderProcess [] - Stopping JobDispatcherLeaderProcess.
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)