You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Sergei Poganshev <s....@slice.com> on 2018/12/24 16:16:22 UTC

Iterations and back pressure problem

We've tried using iterations feature and in case of significant load the
job sometimes stalls and stops processing events due to high back pressure
both in tasks that produces records for iteration and all the other inputs
to this task. It looks like a back pressure loop the task can't handle all
the incoming records, iteration sink loops back into this task and also
gets back pressured. This is basically a "back pressure loop" which causes
a complete job stoppage.

Is there a way to mitigate this (to guarantee such issue does not occur)?

Re: Iterations and back pressure problem

Posted by Ken Krugler <kk...@transpac.com>.
Hi Sergey,

As Andrey noted, it’s a known issue with (currently) no good solution.

I talk a bit about how we worked around it on slide 26 of my Flink Forward talk <https://www.slideshare.net/FlinkForward/flink-forward-san-francisco-2018-ken-krugler-building-a-scalable-focused-web-crawler-with-flink> on a Flink-based web crawler.

Basically we do some cheesy approximate monitoring of in-flight data, and throttle the key producer so that (hopefully) network buffers don’t fill up to the point of deadlock.

— Ken


> On Dec 24, 2018, at 8:46 AM, Andrey Zagrebin <an...@da-platform.com> wrote:
> 
> Hi Sergey,
> 
> It seems to be a known issue. Community will hopefully work on this but I do not see more updates since the last answer to the similar question [1], see also [2] and [3].
> 
> Best,
> Andrey
> 
> [1] http://mail-archives.apache.org/mod_mbox/flink-user/201801.mbox/%3CBFD8C506-5B41-47D8-B735-488D03842051%40data-artisans.com%3E <http://mail-archives.apache.org/mod_mbox/flink-user/201801.mbox/%3CBFD8C506-5B41-47D8-B735-488D03842051%40data-artisans.com%3E>
> [2] http://mail-archives.apache.org/mod_mbox/flink-user/201801.mbox/%3CBFD8C506-5B41-47D8-B735-488D03842051%40data-artisans.com%3E <http://mail-archives.apache.org/mod_mbox/flink-user/201801.mbox/%3CBFD8C506-5B41-47D8-B735-488D03842051%40data-artisans.com%3E>
> [3] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66853132 <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66853132>
> On Mon, Dec 24, 2018 at 7:16 PM Sergei Poganshev <s.poganshev@slice.com <ma...@slice.com>> wrote:
> We've tried using iterations feature and in case of significant load the job sometimes stalls and stops processing events due to high back pressure both in tasks that produces records for iteration and all the other inputs to this task. It looks like a back pressure loop the task can't handle all the incoming records, iteration sink loops back into this task and also gets back pressured. This is basically a "back pressure loop" which causes a complete job stoppage.
> 
> Is there a way to mitigate this (to guarantee such issue does not occur)?

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra


Re: Iterations and back pressure problem

Posted by Andrey Zagrebin <an...@da-platform.com>.
Hi Sergey,

It seems to be a known issue. Community will hopefully work on this but I
do not see more updates since the last answer to the similar question [1],
see also [2] and [3].

Best,
Andrey

[1]
http://mail-archives.apache.org/mod_mbox/flink-user/201801.mbox/%3CBFD8C506-5B41-47D8-B735-488D03842051%40data-artisans.com%3E
[2]
http://mail-archives.apache.org/mod_mbox/flink-user/201801.mbox/%3CBFD8C506-5B41-47D8-B735-488D03842051%40data-artisans.com%3E
[3]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66853132

On Mon, Dec 24, 2018 at 7:16 PM Sergei Poganshev <s....@slice.com>
wrote:

> We've tried using iterations feature and in case of significant load the
> job sometimes stalls and stops processing events due to high back pressure
> both in tasks that produces records for iteration and all the other inputs
> to this task. It looks like a back pressure loop the task can't handle all
> the incoming records, iteration sink loops back into this task and also
> gets back pressured. This is basically a "back pressure loop" which causes
> a complete job stoppage.
>
> Is there a way to mitigate this (to guarantee such issue does not occur)?
>