You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Marvin777 <xy...@gmail.com> on 2018/07/26 02:12:11 UTC

checkpoint always fails

Hi, all:

flink job can run normally, but checkpoint always fails, like this:
[image: image.png]

[image: image.png]
checkpoint configuration:

[image: image.png]

thanks.

Re: checkpoint always fails

Posted by vino yang <ya...@gmail.com>.
Hi Marvin,

Since you are configuring the semantics of Exactly-Once, a task will wait
for all the barriers of multiple input channels on the input side when
performing checkpoints.
This metric reflects the inconsistent progress of all upstream execution
checkpoint tasks, and some tasks may be too slow to cause align to wait a
long time.
There are many reasons why some tasks are handled too slowly, such as keyBy
forming data skew.

Thanks, vino.

2018-07-30 12:42 GMT+08:00 Marvin777 <xy...@gmail.com>:

> Hi vino,
>
> I found the ' Buffered During Alignment ' term to be very large,  what
> causes this phenomenon in general?
> [image: image.png]
>
>
> Marvin777 <xy...@gmail.com> 于2018年7月30日周一 上午10:36写道:
>
>> Hi vino,
>>
>> the issue is FLINK-9945
>> <https://issues.apache.org/jira/browse/FLINK-9945>
>>
>> thanks.
>>
>>
>> vino yang <ya...@gmail.com> 于2018年7月27日周五 下午4:22写道:
>>
>>> Hi Marvin,
>>>
>>> It seems a Checkpoint Bug which triggered your checkpoint timeout. Can
>>> you create a issue in JIRA and describe your details (such as Flink
>>> version) and attach a complete log?
>>>
>>> Thanks, vino.
>>>
>>> 2018-07-26 19:37 GMT+08:00 Marvin777 <xy...@gmail.com>:
>>>
>>>> Hi,vino:
>>>>
>>>> Can you give me a hint,  why the checkpoint expires.
>>>>
>>>> What causes this phenomenon in general?
>>>>
>>>> [image: image.png]
>>>>
>>>>
>>>> thanks.
>>>>
>>>> Marvin777 <xy...@gmail.com> 于2018年7月26日周四 下午12:22写道:
>>>>
>>>>> log
>>>>> https://issues.apache.org/jira/browse/FLINK-9945   (the exception can
>>>>> not be  repeated every time, but checkpoint failed all the time.)
>>>>>
>>>>> state Backend
>>>>> [image: image.png]
>>>>>
>>>>>
>>>>> HA configuration
>>>>> [image: image.png]
>>>>>
>>>>>
>>>>> vino yang <ya...@gmail.com> 于2018年7月26日周四 上午10:22写道:
>>>>>
>>>>>> Hi Marvin,
>>>>>>
>>>>>> Thanks for reporting this issue.
>>>>>>
>>>>>> Can you share more details about the failed checkpoint, such as log,
>>>>>> exception stack trace, which statebackend used, HA configuration?
>>>>>>
>>>>>> These information can help to trace the issue.
>>>>>>
>>>>>> Thanks, vino.
>>>>>>
>>>>>> 2018-07-26 10:12 GMT+08:00 Marvin777 <xy...@gmail.com>:
>>>>>>
>>>>>>> Hi, all:
>>>>>>>
>>>>>>> flink job can run normally, but checkpoint always fails, like this:
>>>>>>> [image: image.png]
>>>>>>>
>>>>>>> [image: image.png]
>>>>>>> checkpoint configuration:
>>>>>>>
>>>>>>> [image: image.png]
>>>>>>>
>>>>>>> thanks.
>>>>>>>
>>>>>>>
>>>>>>
>>>

Re: checkpoint always fails

Posted by Marvin777 <xy...@gmail.com>.
Hi vino,

the issue is FLINK-9945 <https://issues.apache.org/jira/browse/FLINK-9945>

thanks.


vino yang <ya...@gmail.com> 于2018年7月27日周五 下午4:22写道:

> Hi Marvin,
>
> It seems a Checkpoint Bug which triggered your checkpoint timeout. Can you
> create a issue in JIRA and describe your details (such as Flink version)
> and attach a complete log?
>
> Thanks, vino.
>
> 2018-07-26 19:37 GMT+08:00 Marvin777 <xy...@gmail.com>:
>
>> Hi,vino:
>>
>> Can you give me a hint,  why the checkpoint expires.
>>
>> What causes this phenomenon in general?
>>
>> [image: image.png]
>>
>>
>> thanks.
>>
>> Marvin777 <xy...@gmail.com> 于2018年7月26日周四 下午12:22写道:
>>
>>> log
>>> https://issues.apache.org/jira/browse/FLINK-9945   (the exception can not
>>> be  repeated every time, but checkpoint failed all the time.)
>>>
>>> state Backend
>>> [image: image.png]
>>>
>>>
>>> HA configuration
>>> [image: image.png]
>>>
>>>
>>> vino yang <ya...@gmail.com> 于2018年7月26日周四 上午10:22写道:
>>>
>>>> Hi Marvin,
>>>>
>>>> Thanks for reporting this issue.
>>>>
>>>> Can you share more details about the failed checkpoint, such as log,
>>>> exception stack trace, which statebackend used, HA configuration?
>>>>
>>>> These information can help to trace the issue.
>>>>
>>>> Thanks, vino.
>>>>
>>>> 2018-07-26 10:12 GMT+08:00 Marvin777 <xy...@gmail.com>:
>>>>
>>>>> Hi, all:
>>>>>
>>>>> flink job can run normally, but checkpoint always fails, like this:
>>>>> [image: image.png]
>>>>>
>>>>> [image: image.png]
>>>>> checkpoint configuration:
>>>>>
>>>>> [image: image.png]
>>>>>
>>>>> thanks.
>>>>>
>>>>>
>>>>
>

Re: checkpoint always fails

Posted by vino yang <ya...@gmail.com>.
Hi Marvin,

It seems a Checkpoint Bug which triggered your checkpoint timeout. Can you
create a issue in JIRA and describe your details (such as Flink version)
and attach a complete log?

Thanks, vino.

2018-07-26 19:37 GMT+08:00 Marvin777 <xy...@gmail.com>:

> Hi,vino:
>
> Can you give me a hint,  why the checkpoint expires.
>
> What causes this phenomenon in general?
>
> [image: image.png]
>
>
> thanks.
>
> Marvin777 <xy...@gmail.com> 于2018年7月26日周四 下午12:22写道:
>
>> log
>> https://issues.apache.org/jira/browse/FLINK-9945   (the exception can not
>> be  repeated every time, but checkpoint failed all the time.)
>>
>> state Backend
>> [image: image.png]
>>
>>
>> HA configuration
>> [image: image.png]
>>
>>
>> vino yang <ya...@gmail.com> 于2018年7月26日周四 上午10:22写道:
>>
>>> Hi Marvin,
>>>
>>> Thanks for reporting this issue.
>>>
>>> Can you share more details about the failed checkpoint, such as log,
>>> exception stack trace, which statebackend used, HA configuration?
>>>
>>> These information can help to trace the issue.
>>>
>>> Thanks, vino.
>>>
>>> 2018-07-26 10:12 GMT+08:00 Marvin777 <xy...@gmail.com>:
>>>
>>>> Hi, all:
>>>>
>>>> flink job can run normally, but checkpoint always fails, like this:
>>>> [image: image.png]
>>>>
>>>> [image: image.png]
>>>> checkpoint configuration:
>>>>
>>>> [image: image.png]
>>>>
>>>> thanks.
>>>>
>>>>
>>>

Re: checkpoint always fails

Posted by Marvin777 <xy...@gmail.com>.
Hi,vino:

Can you give me a hint,  why the checkpoint expires.

What causes this phenomenon in general?

[image: image.png]


thanks.

Marvin777 <xy...@gmail.com> 于2018年7月26日周四 下午12:22写道:

> log
> https://issues.apache.org/jira/browse/FLINK-9945   (the exception can not
> be  repeated every time, but checkpoint failed all the time.)
>
> state Backend
> [image: image.png]
>
>
> HA configuration
> [image: image.png]
>
>
> vino yang <ya...@gmail.com> 于2018年7月26日周四 上午10:22写道:
>
>> Hi Marvin,
>>
>> Thanks for reporting this issue.
>>
>> Can you share more details about the failed checkpoint, such as log,
>> exception stack trace, which statebackend used, HA configuration?
>>
>> These information can help to trace the issue.
>>
>> Thanks, vino.
>>
>> 2018-07-26 10:12 GMT+08:00 Marvin777 <xy...@gmail.com>:
>>
>>> Hi, all:
>>>
>>> flink job can run normally, but checkpoint always fails, like this:
>>> [image: image.png]
>>>
>>> [image: image.png]
>>> checkpoint configuration:
>>>
>>> [image: image.png]
>>>
>>> thanks.
>>>
>>>
>>

Re: checkpoint always fails

Posted by Marvin777 <xy...@gmail.com>.
log
https://issues.apache.org/jira/browse/FLINK-9945   (the exception can not
be  repeated every time, but checkpoint failed all the time.)

state Backend
[image: image.png]


HA configuration
[image: image.png]


vino yang <ya...@gmail.com> 于2018年7月26日周四 上午10:22写道:

> Hi Marvin,
>
> Thanks for reporting this issue.
>
> Can you share more details about the failed checkpoint, such as log,
> exception stack trace, which statebackend used, HA configuration?
>
> These information can help to trace the issue.
>
> Thanks, vino.
>
> 2018-07-26 10:12 GMT+08:00 Marvin777 <xy...@gmail.com>:
>
>> Hi, all:
>>
>> flink job can run normally, but checkpoint always fails, like this:
>> [image: image.png]
>>
>> [image: image.png]
>> checkpoint configuration:
>>
>> [image: image.png]
>>
>> thanks.
>>
>>
>

Re: checkpoint always fails

Posted by vino yang <ya...@gmail.com>.
Hi Marvin,

Thanks for reporting this issue.

Can you share more details about the failed checkpoint, such as log,
exception stack trace, which statebackend used, HA configuration?

These information can help to trace the issue.

Thanks, vino.

2018-07-26 10:12 GMT+08:00 Marvin777 <xy...@gmail.com>:

> Hi, all:
>
> flink job can run normally, but checkpoint always fails, like this:
> [image: image.png]
>
> [image: image.png]
> checkpoint configuration:
>
> [image: image.png]
>
> thanks.
>
>