You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user-zh@flink.apache.org by 仙剑……情动人间 <15...@qq.com.INVALID> on 2021/07/13 09:31:19 UTC

flink 触发保存点失败

Hi All,


&nbsp; &nbsp; 我触发Flink 保存点总是失败,报错如下,一直说是超时,但是没有进一步的信息可以查看,我查资料说可以设置checkpoint超时时间,我设置了2min,但是触发
保存点时在2min之前就会报错,另外我的 状态 并不大
&nbsp; &nbsp;


------------------------------------------------------------
&nbsp;The program finished with the following exception:


org.apache.flink.util.FlinkException: Triggering a savepoint for the job 00000000000000000000000000000000 failed.
	at org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:777)
	at org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:754)
	at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002)
	at org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:751)
	at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1072)
	at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
	at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
Caused by: java.util.concurrent.TimeoutException
	at org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1255)
	at org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:217)
	at org.apache.flink.runtime.concurrent.FutureUtils.lambda$orTimeout$15(FutureUtils.java:582)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

回复: flink 触发保存点失败

Posted by 仙剑……情动人间 <15...@qq.com.INVALID>.
谢谢您




------------------&nbsp;原始邮件&nbsp;------------------
发件人:                                                                                                                        "user-zh"                                                                                    <myasuka@live.com&gt;;
发送时间:&nbsp;2021年7月14日(星期三) 上午10:50
收件人:&nbsp;"flink邮件列表"<user-zh@flink.apache.org&gt;;

主题:&nbsp;Re: flink 触发保存点失败



Hi,

这个看上去是client触发savepoint失败,而不是savepoint本身end-to-end执行超时。建议对照一下JobManager的日志,观察在触发的时刻,JM日志里是否有触发savepoint的相关日志,也可以在flink web UI上观察相应的savepoint是否出现在checkpoint tab的历史里面。

祝好
唐云
________________________________
From: 仙剑……情动人间 <1510603449@qq.com.INVALID&gt;
Sent: Tuesday, July 13, 2021 17:31
To: flink邮件列表 <user-zh@flink.apache.org&gt;
Subject: flink 触发保存点失败

Hi All,


&amp;nbsp; &amp;nbsp; 我触发Flink 保存点总是失败,报错如下,一直说是超时,但是没有进一步的信息可以查看,我查资料说可以设置checkpoint超时时间,我设置了2min,但是触发
保存点时在2min之前就会报错,另外我的 状态 并不大
&amp;nbsp; &amp;nbsp;


------------------------------------------------------------
&amp;nbsp;The program finished with the following exception:


org.apache.flink.util.FlinkException: Triggering a savepoint for the job 00000000000000000000000000000000 failed.
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:777)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:754)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:751)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1072)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at java.security.AccessController.doPrivileged(Native Method)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at javax.security.auth.Subject.doAs(Subject.java:422)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
Caused by: java.util.concurrent.TimeoutException
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1255)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:217)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.flink.runtime.concurrent.FutureUtils.lambda$orTimeout$15(FutureUtils.java:582)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at java.util.concurrent.FutureTask.run(FutureTask.java:266)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at java.lang.Thread.run(Thread.java:748)

Re: flink 触发保存点失败

Posted by Yun Tang <my...@live.com>.
Hi,

这个看上去是client触发savepoint失败,而不是savepoint本身end-to-end执行超时。建议对照一下JobManager的日志,观察在触发的时刻,JM日志里是否有触发savepoint的相关日志,也可以在flink web UI上观察相应的savepoint是否出现在checkpoint tab的历史里面。

祝好
唐云
________________________________
From: 仙剑……情动人间 <15...@qq.com.INVALID>
Sent: Tuesday, July 13, 2021 17:31
To: flink邮件列表 <us...@flink.apache.org>
Subject: flink 触发保存点失败

Hi All,


&nbsp; &nbsp; 我触发Flink 保存点总是失败,报错如下,一直说是超时,但是没有进一步的信息可以查看,我查资料说可以设置checkpoint超时时间,我设置了2min,但是触发
保存点时在2min之前就会报错,另外我的 状态 并不大
&nbsp; &nbsp;


------------------------------------------------------------
&nbsp;The program finished with the following exception:


org.apache.flink.util.FlinkException: Triggering a savepoint for the job 00000000000000000000000000000000 failed.
        at org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:777)
        at org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:754)
        at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002)
        at org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:751)
        at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1072)
        at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
Caused by: java.util.concurrent.TimeoutException
        at org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1255)
        at org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:217)
        at org.apache.flink.runtime.concurrent.FutureUtils.lambda$orTimeout$15(FutureUtils.java:582)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Re: flink 触发保存点失败

Posted by Caizhi Weng <ts...@gmail.com>.
Hi!

这个报错是 client 提起触发 checkpoint 的请求后,job manager 没有及时反馈 checkpoint
的结果。没有及时反馈的原因可能有很多,比如 checkpoint 超时,比如网络通信问题等等。可以打开 flink web ui
看一下是否有更多信息,或者打开 job manager 和 task manager 的 log 看一下。

仙剑……情动人间 <15...@qq.com.invalid> 于2021年7月13日周二 下午7:19写道:

> Hi All,
>
>
> &nbsp; &nbsp; 我触发Flink
> 保存点总是失败,报错如下,一直说是超时,但是没有进一步的信息可以查看,我查资料说可以设置checkpoint超时时间,我设置了2min,但是触发
> 保存点时在2min之前就会报错,另外我的 状态 并不大
> &nbsp; &nbsp;
>
>
> ------------------------------------------------------------
> &nbsp;The program finished with the following exception:
>
>
> org.apache.flink.util.FlinkException: Triggering a savepoint for the job
> 00000000000000000000000000000000 failed.
>         at
> org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:777)
>         at
> org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:754)
>         at
> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002)
>         at
> org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:751)
>         at
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1072)
>         at
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>         at
> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>         at
> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
> Caused by: java.util.concurrent.TimeoutException
>         at
> org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1255)
>         at
> org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:217)
>         at
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$orTimeout$15(FutureUtils.java:582)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)

回复: flink 触发保存点失败

Posted by 仙剑……情动人间 <15...@qq.com.INVALID>.
非常感谢, 确实是这个问题,我把基于 zk 的ha配置 使用 —D 参数指定之后就成功触发了savepoint




------------------&nbsp;原始邮件&nbsp;------------------
发件人:                                                                                                                        "user-zh"                                                                                    <lycbug666@gmail.com&gt;;
发送时间:&nbsp;2021年7月28日(星期三) 中午11:16
收件人:&nbsp;"user-zh"<user-zh@flink.apache.org&gt;;

主题:&nbsp;Re: flink 触发保存点失败



Hi,
之前遇到过这个 jobid 为 00000 的报错情况。我们的场景是是任务开启了基于 zk 的 ha,但是使用未配置 ha 的 flink
client 去运行 savepoint 命令。
可以考虑下是否是相同的问题。


Michael Ran <greemqqran@163.com&gt; 于2021年7月23日周五 上午10:43写道:

&gt; 有没可能是文件的问题,比如写入权限之类的?
&gt; 在 2021-07-13 17:31:19,"仙剑……情动人间" <1510603449@qq.com.INVALID&gt; 写道:
&gt; &gt;Hi All,
&gt; &gt;
&gt; &gt;
&gt; &gt;&amp;nbsp; &amp;nbsp; 我触发Flink
&gt; 保存点总是失败,报错如下,一直说是超时,但是没有进一步的信息可以查看,我查资料说可以设置checkpoint超时时间,我设置了2min,但是触发
&gt; &gt;保存点时在2min之前就会报错,另外我的 状态 并不大
&gt; &gt;&amp;nbsp; &amp;nbsp;
&gt; &gt;
&gt; &gt;
&gt; &gt;------------------------------------------------------------
&gt; &gt;&amp;nbsp;The program finished with the following exception:
&gt; &gt;
&gt; &gt;
&gt; &gt;org.apache.flink.util.FlinkException: Triggering a savepoint for the job
&gt; 00000000000000000000000000000000 failed.
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:777)
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:754)
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002)
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:751)
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1072)
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at java.security.AccessController.doPrivileged(Native Method)
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at javax.security.auth.Subject.doAs(Subject.java:422)
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
&gt; &gt;Caused by: java.util.concurrent.TimeoutException
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1255)
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:217)
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; org.apache.flink.runtime.concurrent.FutureUtils.lambda$orTimeout$15(FutureUtils.java:582)
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at java.util.concurrent.FutureTask.run(FutureTask.java:266)
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at java.lang.Thread.run(Thread.java:748)
&gt;

Re: flink 触发保存点失败

Posted by 龙逸尘 <ly...@gmail.com>.
Hi,
之前遇到过这个 jobid 为 00000 的报错情况。我们的场景是是任务开启了基于 zk 的 ha,但是使用未配置 ha 的 flink
client 去运行 savepoint 命令。
可以考虑下是否是相同的问题。


Michael Ran <gr...@163.com> 于2021年7月23日周五 上午10:43写道:

> 有没可能是文件的问题,比如写入权限之类的?
> 在 2021-07-13 17:31:19,"仙剑……情动人间" <15...@qq.com.INVALID> 写道:
> >Hi All,
> >
> >
> >&nbsp; &nbsp; 我触发Flink
> 保存点总是失败,报错如下,一直说是超时,但是没有进一步的信息可以查看,我查资料说可以设置checkpoint超时时间,我设置了2min,但是触发
> >保存点时在2min之前就会报错,另外我的 状态 并不大
> >&nbsp; &nbsp;
> >
> >
> >------------------------------------------------------------
> >&nbsp;The program finished with the following exception:
> >
> >
> >org.apache.flink.util.FlinkException: Triggering a savepoint for the job
> 00000000000000000000000000000000 failed.
> >       at
> org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:777)
> >       at
> org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:754)
> >       at
> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002)
> >       at
> org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:751)
> >       at
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1072)
> >       at
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
> >       at java.security.AccessController.doPrivileged(Native Method)
> >       at javax.security.auth.Subject.doAs(Subject.java:422)
> >       at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> >       at
> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
> >       at
> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
> >Caused by: java.util.concurrent.TimeoutException
> >       at
> org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1255)
> >       at
> org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:217)
> >       at
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$orTimeout$15(FutureUtils.java:582)
> >       at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >       at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> >       at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> >       at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> >       at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> >       at java.lang.Thread.run(Thread.java:748)
>

Re:flink 触发保存点失败

Posted by Michael Ran <gr...@163.com>.
有没可能是文件的问题,比如写入权限之类的?
在 2021-07-13 17:31:19,"仙剑……情动人间" <15...@qq.com.INVALID> 写道:
>Hi All,
>
>
>&nbsp; &nbsp; 我触发Flink 保存点总是失败,报错如下,一直说是超时,但是没有进一步的信息可以查看,我查资料说可以设置checkpoint超时时间,我设置了2min,但是触发
>保存点时在2min之前就会报错,另外我的 状态 并不大
>&nbsp; &nbsp;
>
>
>------------------------------------------------------------
>&nbsp;The program finished with the following exception:
>
>
>org.apache.flink.util.FlinkException: Triggering a savepoint for the job 00000000000000000000000000000000 failed.
>	at org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:777)
>	at org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:754)
>	at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002)
>	at org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:751)
>	at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1072)
>	at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
>	at java.security.AccessController.doPrivileged(Native Method)
>	at javax.security.auth.Subject.doAs(Subject.java:422)
>	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>	at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>	at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
>Caused by: java.util.concurrent.TimeoutException
>	at org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1255)
>	at org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:217)
>	at org.apache.flink.runtime.concurrent.FutureUtils.lambda$orTimeout$15(FutureUtils.java:582)
>	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>	at java.lang.Thread.run(Thread.java:748)