You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user-zh@flink.apache.org by Robin Zhang <vi...@outlook.com> on 2020/09/25 10:00:42 UTC

Re: flink 1.9.2 升级 1.10.0 任务失败不能从checkpoint恢复

   Hi,Tang老师,       抱歉,之前理解有误,感谢唐老师指正。        祝好,Robin
Zhang____________________________________________________________________
Yun Tang wrote
> Hi Robin其实你的说法不是很准确,社区是明文保证savepoint的兼容性
> [1],但是并不意味着跨大版本时无法从checkpoint恢复,社区不承诺主要还是维护其太耗费精力,但是实际从代码角度来说,在合理使用state
> schema evolution [2]的前提下,目前跨版本checkpoint恢复基本都是兼容的.另外 @Peihui
> 也请麻烦对你的异常描述清晰一些,我的第一次回复已经推测该异常不是root cause,还请在日志中找一下无法恢复的root
> cause,如果不知道怎么从日志里面找,可以把相关日志分享出来。[1]
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/upgrading.html#compatibility-table[2]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/schema_evolution.html祝好唐云________________________________From:
> Robin Zhang &lt;

> vincent2015qdlg@

> &gt;Sent: Wednesday, July 15, 2020 16:23To: 

> user-zh@.apache

>  &lt;

> user-zh@.apache

> &gt;Subject: Re: flink 1.9.2 升级 1.10.0
> 任务失败不能从checkpoint恢复据我所知,跨大版本的不能直接从checkoint恢复,只能放弃状态重新跑BestRobin
> Zhang________________________________From: Peihui He <[hidden email]>Sent:
> Tuesday, July 14, 2020 10:42To: [hidden email] <[hidden email]>Subject:
> flink 1.9.2 升级 1.10.0 任务失败不能从checkpoint恢复hello,        当升级到1.10.0
> 时候,程序出错后会尝试从checkpoint恢复,但是总是失败,提示Caused by:
> java.nio.file.NoSuchFileException:/data/hadoop/yarn/local/usercache/hdfs/appcache/application_1589438582606_30760/flink-io-26af2be2-2b14-4eab-90d8-9ebb32ace6e3/job_6b6cacb02824b8521808381113f57eff_op_StreamGroupedReduce_54cc3719665e6629c9000e9308537a5e__1_1__uuid_afda2b8b-0b79-449e-88b5-c34c27c1a079/db/000009.sst->/data/hadoop/yarn/local/usercache/hdfs/appcache/application_1589438582606_30760/flink-io-26af2be2-2b14-4eab-90d8-9ebb32ace6e3/job_6b6cacb02824b8521808381113f57eff_op_StreamGroupedReduce_54cc3719665e6629c9000e9308537a5e__1_1__uuid_afda2b8b-0b79-449e-88b5-c34c27c1a079/8f609663-4fbb-483f-83c0-de04654310f7/000009.sst配置和1.9.2
> 一样:state.backend: rocksdbstate.checkpoints.dir:
> hdfs:///flink/checkpoints/wc/state.savepoints.dir:
> hdfs:///flink/savepoints/wc/state.backend.incremental:
> true代码上都有env.enableCheckpointing(10000);env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);env.setRestartStrategy(RestartStrategies.fixedDelayRestart(3,org.apache.flink.api.common.time.Time.of(10,
> TimeUnit.SECONDS)));          是1.10.0 需要做什么特别配置么?--Sent from:
> http://apache-flink.147419.n8.nabble.com/





--
Sent from: http://apache-flink.147419.n8.nabble.com/