You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user-zh@flink.apache.org by Congxian Qiu <qc...@gmail.com> on 2020/11/04 05:28:54 UTC
Re: Re:Re: flink-1.10 checkpoint 偶尔报 NullPointerException
Hi
这个问题看上去是特定 JDK 版本上,某些写法下对象被提前回收了,猜测和 gc 有关。之前看到一个可能相关的帖子[1]
[1] https://cloud.tencent.com/developer/news/564780
Best,
Congxian
蒋佳成(Jiacheng Jiang) <92...@qq.com> 于2020年11月4日周三 上午11:33写道:
> hi,这个问题我也遇到了,这个问题的根本原因是啥呢?
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "chenkaibit"<chenkaibit@163.com>;
> 发送时间: 2020年5月9日(星期六) 中午12:09
> 收件人: "user-zh"<user-zh@flink.apache.org>;
> 主题: Re:Re:Re: flink-1.10 checkpoint 偶尔报 NullPointerException
>
>
>
> Hi:
> 加了一些日志后发现是 checkpointMetaData 为 NULL 了
> https://github.com/apache/flink/blob/release-1.10.0/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1421
> 测试程序为读 kafka,然后进行 wordcount,结果写入 kafka。checkpoint 配置如下:
> | Checkpointing Mode | Exactly Once |
> | Interval | 5s |
> | Timeout | 10m 0s |
>
> | Minimum Pause Between Checkpoints | 0ms |
> | Maximum Concurrent Checkpoints | 1 |
>
>
> 稳定在第 5377 个 checkpoint 抛出 NPE
>
>
> 虽然原因还不清楚,但是修改了部分代码(见
> https://github.com/yuchuanchen/flink/commit/e5122d9787be1fee9bce141887e0d70c9b0a4f19
> )后不再出现 NPE 了。
>
>
> 在 2020-04-21 10:21:56,"chenkaibit" <
> chenkaibit@163.com> 写道:
> >
> >
> >
> >这个不是稳定复现的,但是在最近 1.10 上测试的几个作业出现了,触发时也没有其他报错。我加了一些日志,再观察下
> >
> >
> >
> >
> >在 2020-04-21 01:12:48,"Yun Tang" <
> myasuka@live.com> 写道:
> >>Hi
> >>
> >>这个NPE有点奇怪,从executeCheckpointing方法[1]里面其实比较难定位究竟是哪一个变量或者变量的取值是null。
>
> >>一种排查思路是打开 org.apache.flink.streaming.runtime.tasks 的DEBUG level日志,通过debug日志缩小范围,判断哪个变量是null
> >>
> >>这个异常出现的时候,相关task上面的日志有什么异常么,触发这个NPE的条件是什么,稳定复现么?
> >>
> >>[1]
> https://github.com/apache/flink/blob/aa4eb8f0c9ce74e6b92c3d9be5dc8e8cb536239d/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1349
> >>
> <https://github.com/apache/flink/blob/aa4eb8f0c9ce74e6b92c3d9be5dc8e8cb536239d/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1349>>>
> ;
> >>祝好
> >>唐云
> >>
> >>________________________________
> >>From: chenkaibit <chenkaibit@163.com>
> >>Sent: Monday, April 20, 2020 18:39
> >>To: user-zh@flink.apache.org <user-zh@flink.apache.org
> >
>
> >>Subject: flink-1.10 checkpoint 偶尔报 NullPointerException
> >>
>
> >>大家遇到过这个错误吗, CheckpointOperation.executeCheckpointing 的时候报 NullPointerException
>
> >>java.lang.Exception: Couldnot perform checkpoint 5505for operator Source: KafkaTableSource(xxx) -> SourceConversion(table=[xxx, source: [KafkaTableSource(xxx)]], fields=[xxx]) -> Calc(select=[xxx) AS xxx]) -> SinkConversionToTuple2 -> Sink: Elasticsearch6UpsertTableSink(xxx) (1/1).
> >>
>
> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:802)
> >>
>
> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$triggerCheckpointAsync$3(StreamTask.java:777)
> >>
>
> >> at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$228/1024478318.call(UnknownSource)
> >>
>
> >> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >>
>
> >> at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.run(StreamTaskActionExecutor.java:87)
> >>
>
> >> at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78)
> >>
>
> >> at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:261)
> >>
>
> >> at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:186)
> >>
>
> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:487)
> >>
>
> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:470)
> >>
>
> >> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
> >>
>
> >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
> >>
>
> >> at java.lang.Thread.run(Thread.java:745)
> >>
> >>Causedby: java.lang.NullPointerException
> >>
>
> >> at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1411)
> >>
>
> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:991)
> >>
>
> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:887)
> >>
>
> >> at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$229/1010499540.run(UnknownSource)
> >>
>
> >> at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94)
> >>
>
> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:860)
> >>
>
> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:793)
> >>
> >> ... 12 more