You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by 徐涛 <ha...@gmail.com> on 2018/10/11 11:22:06 UTC

Re: Small checkpoint data takes too much time

Hi Zhijiang,
	Thanks for your response.
	I add the checkpointAlignmentTime, the data shows that the checkpointDuration is about 150s, and the checkpointAlignmentTims is about 4s. There is a big gap between them.

Best
Henry

> 在 2018年10月10日，下午1:26，Zhijiang(wangzhijiang999) <wa...@aliyun.com> 写道：
> 
> The checkpoint duration includes the processes of barrier alignment and state snapshot. Every task has to receive all the barriers from all the channels, then trriger to snapshot state.
> I guess the barrier alignment may take long time for your case, and it is specially critical during backpressure. You can check the metric of "checkpointAlignmentTime" for confirmation.
> 
> Best,
> Zhijiang
> ------------------------------------------------------------------
> 发件人：徐涛 <ha...@gmail.com>
> 发送时间：2018年10月10日(星期三) 13:13
> 收件人：user <us...@flink.apache.org>
> 主　题：Small checkpoint data takes too much time
> 
> Hi 
>  I recently encounter a problem in production. I found checkpoint takes too much time, although it doesn`t affect the job execution.
>  I am using FsStateBackend, writing the data to a HDFS checkpointDataUri, and asynchronousSnapshots, I print the metric data “lastCheckpointDuration” and “lastCheckpointSize”. It shows the “lastCheckpointSize” is about 80KB, but the “lastCheckpointDuration” is about 160s! Because checkpoint data is small , I think it should not take that long time. I do not know why and which condition may influent the checkpoint time. Does anyone has encounter such problem?
>  Thanks a lot.
> 
> Best
> Henry
>