You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user-zh@flink.apache.org by chenkaibit <ch...@163.com> on 2020/04/20 10:39:09 UTC

flink-1.10 checkpoint 偶尔报 NullPointerException

大家遇到过这个错误吗, CheckpointOperation.executeCheckpointing 的时候报 NullPointerException 
java.lang.Exception: Couldnot perform checkpoint 5505for operator Source: KafkaTableSource(xxx) -> SourceConversion(table=[xxx, source: [KafkaTableSource(xxx)]], fields=[xxx]) -> Calc(select=[xxx) AS xxx]) -> SinkConversionToTuple2 -> Sink: Elasticsearch6UpsertTableSink(xxx) (1/1).

    at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:802)

    at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$triggerCheckpointAsync$3(StreamTask.java:777)

    at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$228/1024478318.call(UnknownSource)

    at java.util.concurrent.FutureTask.run(FutureTask.java:266)

    at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.run(StreamTaskActionExecutor.java:87)

    at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78)

    at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:261)

    at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:186)

    at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:487)

    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:470)

    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)

    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)

    at java.lang.Thread.run(Thread.java:745)

Causedby: java.lang.NullPointerException

    at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1411)

    at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:991)

    at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:887)

    at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$229/1010499540.run(UnknownSource)

    at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94)

    at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:860)

    at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:793)

    ... 12 more

回复:flink-1.10 checkpoint 偶尔报 NullPointerException

Posted by faaron zheng <fa...@gmail.com>.
你这样改没什么用吧,如果checkpointMetaData为空还是会报错吧 在2020年05月09日 12:09,chenkaibit 写道: Hi: 加了一些日志后发现是 checkpointMetaData 为 NULL 了 https://github.com/apache/flink/blob/release-1.10.0/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1421 测试程序为读 kafka,然后进行 wordcount,结果写入 kafka。checkpoint 配置如下: | Checkpointing Mode | Exactly Once | | Interval | 5s | | Timeout | 10m 0s | | Minimum Pause Between Checkpoints | 0ms | | Maximum Concurrent Checkpoints | 1 | 稳定在第 5377 个 checkpoint 抛出 NPE 虽然原因还不清楚,但是修改了部分代码(见 https://github.com/yuchuanchen/flink/commit/e5122d9787be1fee9bce141887e0d70c9b0a4f19)后不再出现 NPE 了。 在 2020-04-21 10:21:56,"chenkaibit" <ch...@163.com> 写道: > > > >这个不是稳定复现的,但是在最近 1.10 上测试的几个作业出现了,触发时也没有其他报错。我加了一些日志,再观察下 > > > > >在 2020-04-21 01:12:48,"Yun Tang" <my...@live.com> 写道: >>Hi >> >>这个NPE有点奇怪,从executeCheckpointing方法[1]里面其实比较难定位究竟是哪一个变量或者变量的取值是null。 >>一种排查思路是打开 org.apache.flink.streaming.runtime.tasks 的DEBUG level日志,通过debug日志缩小范围,判断哪个变量是null >> >>这个异常出现的时候,相关task上面的日志有什么异常么,触发这个NPE的条件是什么,稳定复现么? >> >>[1] https://github.com/apache/flink/blob/aa4eb8f0c9ce74e6b92c3d9be5dc8e8cb536239d/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1349 >> >>祝好 >>唐云 >> >>________________________________ >>From: chenkaibit <ch...@163.com> >>Sent: Monday, April 20, 2020 18:39 >>To: user-zh@flink.apache.org <us...@flink.apache.org> >>Subject: flink-1.10 checkpoint 偶尔报 NullPointerException >> >>大家遇到过这个错误吗, CheckpointOperation.executeCheckpointing 的时候报 NullPointerException >>java.lang.Exception: Couldnot perform checkpoint 5505for operator Source: KafkaTableSource(xxx) -> SourceConversion(table=[xxx, source: [KafkaTableSource(xxx)]], fields=[xxx]) -> Calc(select=[xxx) AS xxx]) -> SinkConversionToTuple2 -> Sink: Elasticsearch6UpsertTableSink(xxx) (1/1). >> >>    at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:802) >> >>    at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$triggerCheckpointAsync$3(StreamTask.java:777) >> >>    at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$228/1024478318.call(UnknownSource) >> >>    at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> >>    at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.run(StreamTaskActionExecutor.java:87) >> >>    at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78) >> >>    at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:261) >> >>    at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:186) >> >>    at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:487) >> >>    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:470) >> >>    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707) >> >>    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532) >> >>    at java.lang.Thread.run(Thread.java:745) >> >>Causedby: java.lang.NullPointerException >> >>    at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1411) >> >>    at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:991) >> >>    at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:887) >> >>    at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$229/1010499540.run(UnknownSource) >> >>    at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94) >> >>    at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:860) >> >>    at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:793) >> >>    ... 12 more faaron zheng 邮箱:faaronzheng@gmail.com 签名由 网易邮箱大师 定制

Re:Re:Re: flink-1.10 checkpoint 偶尔报 NullPointerException

Posted by chenkaibit <ch...@163.com>.
Hi:
加了一些日志后发现是 checkpointMetaData 为 NULL 了 https://github.com/apache/flink/blob/release-1.10.0/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1421
测试程序为读 kafka,然后进行 wordcount,结果写入 kafka。checkpoint 配置如下:
| Checkpointing Mode | Exactly Once |
| Interval | 5s |
| Timeout | 10m 0s |
| Minimum Pause Between Checkpoints | 0ms |
| Maximum Concurrent Checkpoints | 1 |


稳定在第 5377 个 checkpoint 抛出 NPE


虽然原因还不清楚,但是修改了部分代码(见 https://github.com/yuchuanchen/flink/commit/e5122d9787be1fee9bce141887e0d70c9b0a4f19)后不再出现 NPE 了。


在 2020-04-21 10:21:56,"chenkaibit" <ch...@163.com> 写道:
>
>
>
>这个不是稳定复现的,但是在最近 1.10 上测试的几个作业出现了,触发时也没有其他报错。我加了一些日志,再观察下
>
>
>
>
>在 2020-04-21 01:12:48,"Yun Tang" <my...@live.com> 写道:
>>Hi
>>
>>这个NPE有点奇怪,从executeCheckpointing方法[1]里面其实比较难定位究竟是哪一个变量或者变量的取值是null。
>>一种排查思路是打开 org.apache.flink.streaming.runtime.tasks 的DEBUG level日志,通过debug日志缩小范围,判断哪个变量是null
>>
>>这个异常出现的时候,相关task上面的日志有什么异常么,触发这个NPE的条件是什么,稳定复现么?
>>
>>[1] https://github.com/apache/flink/blob/aa4eb8f0c9ce74e6b92c3d9be5dc8e8cb536239d/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1349
>>
>>祝好
>>唐云
>>
>>________________________________
>>From: chenkaibit <ch...@163.com>
>>Sent: Monday, April 20, 2020 18:39
>>To: user-zh@flink.apache.org <us...@flink.apache.org>
>>Subject: flink-1.10 checkpoint 偶尔报 NullPointerException
>>
>>大家遇到过这个错误吗, CheckpointOperation.executeCheckpointing 的时候报 NullPointerException
>>java.lang.Exception: Couldnot perform checkpoint 5505for operator Source: KafkaTableSource(xxx) -> SourceConversion(table=[xxx, source: [KafkaTableSource(xxx)]], fields=[xxx]) -> Calc(select=[xxx) AS xxx]) -> SinkConversionToTuple2 -> Sink: Elasticsearch6UpsertTableSink(xxx) (1/1).
>>
>>    at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:802)
>>
>>    at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$triggerCheckpointAsync$3(StreamTask.java:777)
>>
>>    at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$228/1024478318.call(UnknownSource)
>>
>>    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>
>>    at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.run(StreamTaskActionExecutor.java:87)
>>
>>    at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78)
>>
>>    at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:261)
>>
>>    at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:186)
>>
>>    at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:487)
>>
>>    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:470)
>>
>>    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
>>
>>    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
>>
>>    at java.lang.Thread.run(Thread.java:745)
>>
>>Causedby: java.lang.NullPointerException
>>
>>    at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1411)
>>
>>    at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:991)
>>
>>    at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:887)
>>
>>    at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$229/1010499540.run(UnknownSource)
>>
>>    at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94)
>>
>>    at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:860)
>>
>>    at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:793)
>>
>>    ... 12 more

Re:Re: flink-1.10 checkpoint 偶尔报 NullPointerException

Posted by chenkaibit <ch...@163.com>.


这个不是稳定复现的,但是在最近 1.10 上测试的几个作业出现了,触发时也没有其他报错。我加了一些日志,再观察下




在 2020-04-21 01:12:48,"Yun Tang" <my...@live.com> 写道:
>Hi
>
>这个NPE有点奇怪,从executeCheckpointing方法[1]里面其实比较难定位究竟是哪一个变量或者变量的取值是null。
>一种排查思路是打开 org.apache.flink.streaming.runtime.tasks 的DEBUG level日志,通过debug日志缩小范围,判断哪个变量是null
>
>这个异常出现的时候,相关task上面的日志有什么异常么,触发这个NPE的条件是什么,稳定复现么?
>
>[1] https://github.com/apache/flink/blob/aa4eb8f0c9ce74e6b92c3d9be5dc8e8cb536239d/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1349
>
>祝好
>唐云
>
>________________________________
>From: chenkaibit <ch...@163.com>
>Sent: Monday, April 20, 2020 18:39
>To: user-zh@flink.apache.org <us...@flink.apache.org>
>Subject: flink-1.10 checkpoint 偶尔报 NullPointerException
>
>大家遇到过这个错误吗, CheckpointOperation.executeCheckpointing 的时候报 NullPointerException
>java.lang.Exception: Couldnot perform checkpoint 5505for operator Source: KafkaTableSource(xxx) -> SourceConversion(table=[xxx, source: [KafkaTableSource(xxx)]], fields=[xxx]) -> Calc(select=[xxx) AS xxx]) -> SinkConversionToTuple2 -> Sink: Elasticsearch6UpsertTableSink(xxx) (1/1).
>
>    at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:802)
>
>    at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$triggerCheckpointAsync$3(StreamTask.java:777)
>
>    at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$228/1024478318.call(UnknownSource)
>
>    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
>    at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.run(StreamTaskActionExecutor.java:87)
>
>    at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78)
>
>    at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:261)
>
>    at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:186)
>
>    at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:487)
>
>    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:470)
>
>    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
>
>    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
>
>    at java.lang.Thread.run(Thread.java:745)
>
>Causedby: java.lang.NullPointerException
>
>    at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1411)
>
>    at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:991)
>
>    at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:887)
>
>    at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$229/1010499540.run(UnknownSource)
>
>    at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94)
>
>    at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:860)
>
>    at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:793)
>
>    ... 12 more

Re: flink-1.10 checkpoint 偶尔报 NullPointerException

Posted by Yun Tang <my...@live.com>.
Hi

这个NPE有点奇怪,从executeCheckpointing方法[1]里面其实比较难定位究竟是哪一个变量或者变量的取值是null。
一种排查思路是打开 org.apache.flink.streaming.runtime.tasks 的DEBUG level日志,通过debug日志缩小范围,判断哪个变量是null

这个异常出现的时候,相关task上面的日志有什么异常么,触发这个NPE的条件是什么,稳定复现么?

[1] https://github.com/apache/flink/blob/aa4eb8f0c9ce74e6b92c3d9be5dc8e8cb536239d/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1349

祝好
唐云

________________________________
From: chenkaibit <ch...@163.com>
Sent: Monday, April 20, 2020 18:39
To: user-zh@flink.apache.org <us...@flink.apache.org>
Subject: flink-1.10 checkpoint 偶尔报 NullPointerException

大家遇到过这个错误吗, CheckpointOperation.executeCheckpointing 的时候报 NullPointerException
java.lang.Exception: Couldnot perform checkpoint 5505for operator Source: KafkaTableSource(xxx) -> SourceConversion(table=[xxx, source: [KafkaTableSource(xxx)]], fields=[xxx]) -> Calc(select=[xxx) AS xxx]) -> SinkConversionToTuple2 -> Sink: Elasticsearch6UpsertTableSink(xxx) (1/1).

    at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:802)

    at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$triggerCheckpointAsync$3(StreamTask.java:777)

    at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$228/1024478318.call(UnknownSource)

    at java.util.concurrent.FutureTask.run(FutureTask.java:266)

    at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.run(StreamTaskActionExecutor.java:87)

    at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78)

    at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:261)

    at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:186)

    at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:487)

    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:470)

    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)

    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)

    at java.lang.Thread.run(Thread.java:745)

Causedby: java.lang.NullPointerException

    at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1411)

    at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:991)

    at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:887)

    at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$229/1010499540.run(UnknownSource)

    at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94)

    at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:860)

    at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:793)

    ... 12 more