You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user-zh@flink.apache.org by yidan zhao <hi...@gmail.com> on 2021/11/26 03:18:24 UTC

FlinkSQL kafka2hive每次检查点导致任务失败

如题,注意,非检查点本身失败,而是检查点完成后导致任务失败。

目前跟进报错是PartitionTimeCommitTrigger.committablePartitions部分如下代码报的异常:

if (!watermarks.containsKey(checkpointId)) {
    throw new IllegalArgumentException(
            String.format(
                    "Checkpoint(%d) has not been snapshot. The
watermark information is: %s.",
                    checkpointId, watermarks));
}

请问什么情况会导致这个问题呢。  我任务情况是,中途出现过2,3个检查点失败,然后后续就这样了。

报错日志如下:
java.lang.IllegalArgumentException: Checkpoint(77) has not been snapshot.
The watermark information is: {78=1637893142284, 80=1637894822410, 81=
1637895338276}.
    at org.apache.flink.table.filesystem.stream.PartitionTimeCommitTrigger
.committablePartitions(PartitionTimeCommitTrigger.java:122)
    at org.apache.flink.table.filesystem.stream.PartitionCommitter
.commitPartitions(PartitionCommitter.java:151)
    at org.apache.flink.table.filesystem.stream.PartitionCommitter
.processElement(PartitionCommitter.java:143)
    at org.apache.flink.streaming.runtime.tasks.
OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask
.java:205)
    at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput
.processElement(AbstractStreamTaskNetworkInput.java:134)
    at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput
.emitNext(AbstractStreamTaskNetworkInput.java:105)
    at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor
.processInput(StreamOneInputProcessor.java:66)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(
StreamTask.java:423)
    at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor
.runMailboxLoop(MailboxProcessor.java:204)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(
StreamTask.java:681)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(
StreamTask.java:636)
    at org.apache.flink.streaming.runtime.tasks.StreamTask
.runWithCleanUpOnFail(StreamTask.java:647)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask
.java:620)
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
    at java.lang.Thread.run(Thread.java:748)

Re: FlinkSQL kafka2hive每次检查点导致任务失败

Posted by yidan zhao <hi...@gmail.com>.
hi,有人清楚如上问题吗,确认下是不是bug,我感觉是某种情况下会导致的问题,这个情况大概率是flink应该兼容考虑的。

yidan zhao <hi...@gmail.com> 于2021年11月26日周五 下午2:19写道:

> 我认为这个应该是bug。
>
> yidan zhao <hi...@gmail.com> 于2021年11月26日周五 上午11:18写道:
>
>> 如题,注意,非检查点本身失败,而是检查点完成后导致任务失败。
>>
>> 目前跟进报错是PartitionTimeCommitTrigger.committablePartitions部分如下代码报的异常:
>>
>> if (!watermarks.containsKey(checkpointId)) {
>>     throw new IllegalArgumentException(
>>             String.format(
>>                     "Checkpoint(%d) has not been snapshot. The watermark information is: %s.",
>>                     checkpointId, watermarks));
>> }
>>
>> 请问什么情况会导致这个问题呢。  我任务情况是,中途出现过2,3个检查点失败,然后后续就这样了。
>>
>> 报错日志如下:
>> java.lang.IllegalArgumentException: Checkpoint(77) has not been
>> snapshot. The watermark information is: {78=1637893142284, 80=
>> 1637894822410, 81=1637895338276}.
>>     at org.apache.flink.table.filesystem.stream.
>> PartitionTimeCommitTrigger.committablePartitions(
>> PartitionTimeCommitTrigger.java:122)
>>     at org.apache.flink.table.filesystem.stream.PartitionCommitter
>> .commitPartitions(PartitionCommitter.java:151)
>>     at org.apache.flink.table.filesystem.stream.PartitionCommitter
>> .processElement(PartitionCommitter.java:143)
>>     at org.apache.flink.streaming.runtime.tasks.
>> OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask
>> .java:205)
>>     at org.apache.flink.streaming.runtime.io.
>> AbstractStreamTaskNetworkInput.processElement(
>> AbstractStreamTaskNetworkInput.java:134)
>>     at org.apache.flink.streaming.runtime.io.
>> AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput
>> .java:105)
>>     at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor
>> .processInput(StreamOneInputProcessor.java:66)
>>     at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(
>> StreamTask.java:423)
>>     at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor
>> .runMailboxLoop(MailboxProcessor.java:204)
>>     at org.apache.flink.streaming.runtime.tasks.StreamTask
>> .runMailboxLoop(StreamTask.java:681)
>>     at org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(
>> StreamTask.java:636)
>>     at org.apache.flink.streaming.runtime.tasks.StreamTask
>> .runWithCleanUpOnFail(StreamTask.java:647)
>>     at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(
>> StreamTask.java:620)
>>     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
>>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
>>     at java.lang.Thread.run(Thread.java:748)
>>
>>
>>

Re: FlinkSQL kafka2hive每次检查点导致任务失败

Posted by yidan zhao <hi...@gmail.com>.
我认为这个应该是bug。

yidan zhao <hi...@gmail.com> 于2021年11月26日周五 上午11:18写道:

> 如题,注意,非检查点本身失败,而是检查点完成后导致任务失败。
>
> 目前跟进报错是PartitionTimeCommitTrigger.committablePartitions部分如下代码报的异常:
>
> if (!watermarks.containsKey(checkpointId)) {
>     throw new IllegalArgumentException(
>             String.format(
>                     "Checkpoint(%d) has not been snapshot. The watermark information is: %s.",
>                     checkpointId, watermarks));
> }
>
> 请问什么情况会导致这个问题呢。  我任务情况是,中途出现过2,3个检查点失败,然后后续就这样了。
>
> 报错日志如下:
> java.lang.IllegalArgumentException: Checkpoint(77) has not been snapshot.
> The watermark information is: {78=1637893142284, 80=1637894822410, 81=
> 1637895338276}.
>     at org.apache.flink.table.filesystem.stream.PartitionTimeCommitTrigger
> .committablePartitions(PartitionTimeCommitTrigger.java:122)
>     at org.apache.flink.table.filesystem.stream.PartitionCommitter
> .commitPartitions(PartitionCommitter.java:151)
>     at org.apache.flink.table.filesystem.stream.PartitionCommitter
> .processElement(PartitionCommitter.java:143)
>     at org.apache.flink.streaming.runtime.tasks.
> OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask
> .java:205)
>     at org.apache.flink.streaming.runtime.io.
> AbstractStreamTaskNetworkInput.processElement(
> AbstractStreamTaskNetworkInput.java:134)
>     at org.apache.flink.streaming.runtime.io.
> AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput
> .java:105)
>     at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor
> .processInput(StreamOneInputProcessor.java:66)
>     at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(
> StreamTask.java:423)
>     at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor
> .runMailboxLoop(MailboxProcessor.java:204)
>     at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(
> StreamTask.java:681)
>     at org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(
> StreamTask.java:636)
>     at org.apache.flink.streaming.runtime.tasks.StreamTask
> .runWithCleanUpOnFail(StreamTask.java:647)
>     at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(
> StreamTask.java:620)
>     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
>     at java.lang.Thread.run(Thread.java:748)
>
>
>