You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user-zh@flink.apache.org by Storm☀️ <st...@163.com> on 2020/09/27 02:16:50 UTC
Flink 1.10.1 checkpoint失败问题
各位好,checkpoint相关问题L
flink版本1.10.1:,个别的checkpoint过程发生问题:
java.lang.Exception: Could not perform checkpoint 1194 for operator Map
(3/3).
at
org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:816)
at
org.apache.flink.streaming.runtime.io.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:86)
at
org.apache.flink.streaming.runtime.io.CheckpointBarrierTracker.processBarrier(CheckpointBarrierTracker.java:99)
at
org.apache.flink.streaming.runtime.io.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:155)
at
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:133)
at
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:69)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:310)
at
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:187)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:485)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:469)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:708)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:533)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at
org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1382)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:974)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:870)
at
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:843)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:803)
... 12 mor
绝大部分是正常完成的,但是小部分比如上面的情况,就会失败,还会导致suspending-->restart.
--
Sent from: http://apache-flink.147419.n8.nabble.com/
Re: Flink 1.10.1 checkpoint失败问题
Posted by Storm☀️ <st...@163.com>.
谢谢
我看了那个issue,有问题的是jdk 1.8_060版本的,我们用的是074版本的。
我测试环境尝试升级一下jdk到251版本。
--
Sent from: http://apache-flink.147419.n8.nabble.com/
Re: Re:Re: Flink 1.10.1 checkpoint失败问题
Posted by Congxian Qiu <qc...@gmail.com>.
FYI 分享一个可能相关的文章[1]
[1] https://cloud.tencent.com/developer/news/564780
Best,
Congxian
Storm☀️ <st...@163.com> 于2020年10月15日周四 上午10:42写道:
> 非常感谢。
> 后续我关注下这个问题,有结论反馈给大家,供参考。
>
>
>
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/
Re: Re:Re: Flink 1.10.1 checkpoint失败问题
Posted by Storm☀️ <st...@163.com>.
非常感谢。
后续我关注下这个问题,有结论反馈给大家,供参考。
--
Sent from: http://apache-flink.147419.n8.nabble.com/
Re:Re: Flink 1.10.1 checkpoint失败问题
Posted by hailongwang <18...@163.com>.
在我们 1.10 版本的生产环境上这个问题也确实出现过,也有几个 issue 在讨论这个,比如:
https://issues.apache.org/jira/browse/FLINK-18196
其中说了2个方法,曾经也试过:
1、是换 JDK 版本,这个没有试过,因为需要更新 NodeManeger 的 JDK,代价比较高;
2、重新 new 一个 CheckpointMetaData,通过修改这个,生产环境上确实没有出现过这个问题了,但是本质原因不太清楚。
希望这些可以帮助到你
Best,
Hailong Wang
在 2020-10-13 18:04:11,"Storm☀️" <st...@163.com> 写道:
>flink版本:Flink1.10.1
>部署方式:flink on yarn
>hadoop版本:cdh5.15.2-2.6.0
>现状:Checkpoint Counts Triggered: 9339In Progress: 0Completed: 8439Failed:
>900Restored: 7
>错误信息:
>ava.lang.Exception: Could not perform checkpoint 1194 for operator Map
>(3/3).
> at
>org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:816)
> at
>org.apache.flink.streaming.runtime.io.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:86)
> at
>org.apache.flink.streaming.runtime.io.CheckpointBarrierTracker.processBarrier(CheckpointBarrierTracker.java:99)
> at
>org.apache.flink.streaming.runtime.io.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:155)
> at
>org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:133)
> at
>org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:69)
> at
>org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:310)
> at
>org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:187)
> at
>org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:485)
> at
>org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:469)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:708)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:533)
> at java.lang.Thread.run(Thread.java:745)
>Caused by: java.lang.NullPointerException
> at
>org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1382)
> at
>org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:974)
> at
>org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:870)
> at
>org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94)
> at
>org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:843)
> at
>org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:803)
> ... 12 more
>
>
>同样的程序在11.2的版本上,chk是完全正常的。
>
>
>
>
>
>--
>Sent from: http://apache-flink.147419.n8.nabble.com/
Re: Flink 1.10.1 checkpoint失败问题
Posted by Storm☀️ <st...@163.com>.
flink版本:Flink1.10.1
部署方式:flink on yarn
hadoop版本:cdh5.15.2-2.6.0
现状:Checkpoint Counts Triggered: 9339In Progress: 0Completed: 8439Failed:
900Restored: 7
错误信息:
ava.lang.Exception: Could not perform checkpoint 1194 for operator Map
(3/3).
at
org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:816)
at
org.apache.flink.streaming.runtime.io.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:86)
at
org.apache.flink.streaming.runtime.io.CheckpointBarrierTracker.processBarrier(CheckpointBarrierTracker.java:99)
at
org.apache.flink.streaming.runtime.io.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:155)
at
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:133)
at
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:69)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:310)
at
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:187)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:485)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:469)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:708)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:533)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at
org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1382)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:974)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:870)
at
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:843)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:803)
... 12 more
同样的程序在11.2的版本上,chk是完全正常的。
--
Sent from: http://apache-flink.147419.n8.nabble.com/
Re: Flink 1.10.1 checkpoint失败问题
Posted by Congxian Qiu <qc...@gmail.com>.
Hi, @Storm 请问你用的是 flink 是哪个版本,然后栈是什么呢?可以把相关性信息回复到这里,可以一起看看是啥问题
Best,
Congxian
大森林 <ap...@foxmail.com> 于2020年10月10日周六 下午1:05写道:
> 我这边是老版本的jdk8,和jdk261没啥关系的
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人:
> "user-zh"
> <
> storm_h_2020@163.com>;
> 发送时间: 2020年10月10日(星期六) 上午9:03
> 收件人: "user-zh"<user-zh@flink.apache.org>;
>
> 主题: Re: Flink 1.10.1 checkpoint失败问题
>
>
>
> 尝试了将jdk升级到了261,报错依然还有。
>
>
>
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/
回复: Flink 1.10.1 checkpoint失败问题
Posted by 大森林 <ap...@foxmail.com>.
我这边是老版本的jdk8,和jdk261没啥关系的
------------------ 原始邮件 ------------------
发件人: "user-zh" <storm_h_2020@163.com>;
发送时间: 2020年10月10日(星期六) 上午9:03
收件人: "user-zh"<user-zh@flink.apache.org>;
主题: Re: Flink 1.10.1 checkpoint失败问题
尝试了将jdk升级到了261,报错依然还有。
--
Sent from: http://apache-flink.147419.n8.nabble.com/
Re: Flink 1.10.1 checkpoint失败问题
Posted by Storm☀️ <st...@163.com>.
尝试了将jdk升级到了261,报错依然还有。
--
Sent from: http://apache-flink.147419.n8.nabble.com/
Re: Flink 1.10.1 checkpoint失败问题
Posted by Congxian Qiu <qc...@gmail.com>.
Hi
这个问题是应该和 FLINK-17479 是一样的,是特定 JDK 上会遇到问题,可以考虑升级一下 flink 版本,或者替换一个 JDK 版本
Best,
Congxian
Storm☀️ <st...@163.com> 于2020年9月27日周日 上午10:17写道:
> 各位好,checkpoint相关问题L
>
> flink版本1.10.1:,个别的checkpoint过程发生问题:
> java.lang.Exception: Could not perform checkpoint 1194 for operator Map
> (3/3).
> at
>
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:816)
> at
> org.apache.flink.streaming.runtime.io
> .CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:86)
> at
> org.apache.flink.streaming.runtime.io
> .CheckpointBarrierTracker.processBarrier(CheckpointBarrierTracker.java:99)
> at
> org.apache.flink.streaming.runtime.io
> .CheckpointedInputGate.pollNext(CheckpointedInputGate.java:155)
> at
> org.apache.flink.streaming.runtime.io
> .StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:133)
> at
> org.apache.flink.streaming.runtime.io
> .StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:69)
> at
>
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:310)
> at
>
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:187)
> at
>
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:485)
> at
>
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:469)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:708)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:533)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at
>
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1382)
> at
>
> org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:974)
> at
>
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:870)
> at
>
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94)
> at
>
> org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:843)
> at
>
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:803)
> ... 12 mor
>
> 绝大部分是正常完成的,但是小部分比如上面的情况,就会失败,还会导致suspending-->restart.
>
>
>
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/
>