You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user-zh@flink.apache.org by yidan zhao <hi...@gmail.com> on 2021/11/26 03:18:24 UTC
FlinkSQL kafka2hive每次检查点导致任务失败
如题,注意,非检查点本身失败,而是检查点完成后导致任务失败。
目前跟进报错是PartitionTimeCommitTrigger.committablePartitions部分如下代码报的异常:
if (!watermarks.containsKey(checkpointId)) {
throw new IllegalArgumentException(
String.format(
"Checkpoint(%d) has not been snapshot. The
watermark information is: %s.",
checkpointId, watermarks));
}
请问什么情况会导致这个问题呢。 我任务情况是,中途出现过2,3个检查点失败,然后后续就这样了。
报错日志如下:
java.lang.IllegalArgumentException: Checkpoint(77) has not been snapshot.
The watermark information is: {78=1637893142284, 80=1637894822410, 81=
1637895338276}.
at org.apache.flink.table.filesystem.stream.PartitionTimeCommitTrigger
.committablePartitions(PartitionTimeCommitTrigger.java:122)
at org.apache.flink.table.filesystem.stream.PartitionCommitter
.commitPartitions(PartitionCommitter.java:151)
at org.apache.flink.table.filesystem.stream.PartitionCommitter
.processElement(PartitionCommitter.java:143)
at org.apache.flink.streaming.runtime.tasks.
OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask
.java:205)
at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput
.processElement(AbstractStreamTaskNetworkInput.java:134)
at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput
.emitNext(AbstractStreamTaskNetworkInput.java:105)
at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor
.processInput(StreamOneInputProcessor.java:66)
at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(
StreamTask.java:423)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor
.runMailboxLoop(MailboxProcessor.java:204)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(
StreamTask.java:681)
at org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(
StreamTask.java:636)
at org.apache.flink.streaming.runtime.tasks.StreamTask
.runWithCleanUpOnFail(StreamTask.java:647)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask
.java:620)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
at java.lang.Thread.run(Thread.java:748)
Re: FlinkSQL kafka2hive每次检查点导致任务失败
Posted by yidan zhao <hi...@gmail.com>.
hi,有人清楚如上问题吗,确认下是不是bug,我感觉是某种情况下会导致的问题,这个情况大概率是flink应该兼容考虑的。
yidan zhao <hi...@gmail.com> 于2021年11月26日周五 下午2:19写道:
> 我认为这个应该是bug。
>
> yidan zhao <hi...@gmail.com> 于2021年11月26日周五 上午11:18写道:
>
>> 如题,注意,非检查点本身失败,而是检查点完成后导致任务失败。
>>
>> 目前跟进报错是PartitionTimeCommitTrigger.committablePartitions部分如下代码报的异常:
>>
>> if (!watermarks.containsKey(checkpointId)) {
>> throw new IllegalArgumentException(
>> String.format(
>> "Checkpoint(%d) has not been snapshot. The watermark information is: %s.",
>> checkpointId, watermarks));
>> }
>>
>> 请问什么情况会导致这个问题呢。 我任务情况是,中途出现过2,3个检查点失败,然后后续就这样了。
>>
>> 报错日志如下:
>> java.lang.IllegalArgumentException: Checkpoint(77) has not been
>> snapshot. The watermark information is: {78=1637893142284, 80=
>> 1637894822410, 81=1637895338276}.
>> at org.apache.flink.table.filesystem.stream.
>> PartitionTimeCommitTrigger.committablePartitions(
>> PartitionTimeCommitTrigger.java:122)
>> at org.apache.flink.table.filesystem.stream.PartitionCommitter
>> .commitPartitions(PartitionCommitter.java:151)
>> at org.apache.flink.table.filesystem.stream.PartitionCommitter
>> .processElement(PartitionCommitter.java:143)
>> at org.apache.flink.streaming.runtime.tasks.
>> OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask
>> .java:205)
>> at org.apache.flink.streaming.runtime.io.
>> AbstractStreamTaskNetworkInput.processElement(
>> AbstractStreamTaskNetworkInput.java:134)
>> at org.apache.flink.streaming.runtime.io.
>> AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput
>> .java:105)
>> at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor
>> .processInput(StreamOneInputProcessor.java:66)
>> at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(
>> StreamTask.java:423)
>> at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor
>> .runMailboxLoop(MailboxProcessor.java:204)
>> at org.apache.flink.streaming.runtime.tasks.StreamTask
>> .runMailboxLoop(StreamTask.java:681)
>> at org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(
>> StreamTask.java:636)
>> at org.apache.flink.streaming.runtime.tasks.StreamTask
>> .runWithCleanUpOnFail(StreamTask.java:647)
>> at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(
>> StreamTask.java:620)
>> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
>> at java.lang.Thread.run(Thread.java:748)
>>
>>
>>
Re: FlinkSQL kafka2hive每次检查点导致任务失败
Posted by yidan zhao <hi...@gmail.com>.
我认为这个应该是bug。
yidan zhao <hi...@gmail.com> 于2021年11月26日周五 上午11:18写道:
> 如题,注意,非检查点本身失败,而是检查点完成后导致任务失败。
>
> 目前跟进报错是PartitionTimeCommitTrigger.committablePartitions部分如下代码报的异常:
>
> if (!watermarks.containsKey(checkpointId)) {
> throw new IllegalArgumentException(
> String.format(
> "Checkpoint(%d) has not been snapshot. The watermark information is: %s.",
> checkpointId, watermarks));
> }
>
> 请问什么情况会导致这个问题呢。 我任务情况是,中途出现过2,3个检查点失败,然后后续就这样了。
>
> 报错日志如下:
> java.lang.IllegalArgumentException: Checkpoint(77) has not been snapshot.
> The watermark information is: {78=1637893142284, 80=1637894822410, 81=
> 1637895338276}.
> at org.apache.flink.table.filesystem.stream.PartitionTimeCommitTrigger
> .committablePartitions(PartitionTimeCommitTrigger.java:122)
> at org.apache.flink.table.filesystem.stream.PartitionCommitter
> .commitPartitions(PartitionCommitter.java:151)
> at org.apache.flink.table.filesystem.stream.PartitionCommitter
> .processElement(PartitionCommitter.java:143)
> at org.apache.flink.streaming.runtime.tasks.
> OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask
> .java:205)
> at org.apache.flink.streaming.runtime.io.
> AbstractStreamTaskNetworkInput.processElement(
> AbstractStreamTaskNetworkInput.java:134)
> at org.apache.flink.streaming.runtime.io.
> AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput
> .java:105)
> at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor
> .processInput(StreamOneInputProcessor.java:66)
> at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(
> StreamTask.java:423)
> at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor
> .runMailboxLoop(MailboxProcessor.java:204)
> at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(
> StreamTask.java:681)
> at org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(
> StreamTask.java:636)
> at org.apache.flink.streaming.runtime.tasks.StreamTask
> .runWithCleanUpOnFail(StreamTask.java:647)
> at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(
> StreamTask.java:620)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
> at java.lang.Thread.run(Thread.java:748)
>
>
>