You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by "Zhijiang(wangzhijiang999)" <wa...@aliyun.com> on 2018/04/08 03:08:31 UTC

回复：Checkpoints very slow with high backpressure

       Backpressure is indeed delayed the checkpoints because of gradually accumulated inflighting network buffers before barrier alignment.       As Piotr explained, 1.5 can improve to some extent.        After 1.5 we plan to further speed the checkpoint by controlling the channel reader to improve barrier alignment, that has already been verified to decrease the alignment time greatly for backpressure scenarios.
        zhijiang
------------------------------------------------------------------发件人：Piotr Nowojski <pi...@data-artisans.com>发送时间：2018年4月6日(星期五) 00:06收件人：Edward <eg...@hotmail.com>抄　送：user <us...@flink.apache.org>主　题：Re: Checkpoints very slow with high backpressure
Thanks for the explanation.

I hope that either 1.5 will solve your issue (please let us know if it doesn’t!) or if you can’t wait, that decreasing memory buffers can mitigate the problem.

Piotrek

> On 5 Apr 2018, at 08:13, Edward <eg...@hotmail.com> wrote:
> 
> Thanks for the update Piotr.
> 
> The reason it prevents us from using checkpoints is this:
> We are relying on the checkpoints to trigger commit of Kafka offsets for our
> source (kafka consumers).
> When there is no backpressure this works fine. When there is backpressure,
> checkpoints fail because they take too long, and our Kafka offsets are never
> committed to Kafka brokers (as we just learned the hard way).
> 
> Normally there is no backpressure in our jobs, but when there is some
> outage, then the jobs do experience 
> backpressure when catching up. And when you're already trying to recover
> from an incident, that is not the ideal time for kafka offsets commits to
> stop working.
> 
> 
> 
> 
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/