You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Kien Truong <du...@gmail.com> on 2018/10/24 12:05:22 UTC

Re: Checkpoint acknowledge takes too long

Hi,

In my experience, this is most likely due to one sub-task is blocked 
doing some long-running operation.

Try to run the task manager with some profiler (like VisualVM) and check 
for hot spot.


Regards,

Kien

On 10/24/2018 4:02 PM, 徐涛 wrote:
> Hi
> 	I am running a flink application with parallelism 64, I left the checkpoint timeout default value, which is 10minutes, the state size is less than 1MB, I am using the FsStateBackend.
> 	The application triggers some checkpoints but all of them fails due to "Checkpoint expired before completing”, I check the checkpoint history, found that there are 63 subtask acknowledge, but one left n/a, and also the alignment duration is quite long, about 5m27s.
> 	I want to know why there is one subtask does not acknowledge? And because the alignment duration is long, what will influent the alignment duration?
> 	Thank a lot.
>
> Best
> Henry