You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Pawel Bartoszek <pa...@gmail.com> on 2018/03/06 23:09:04 UTC

How back pressure works in Flink?

Can you explain how back pressure affect the source in flink? I read the
great article
https://data-artisans.com/blog/how-flink-handles-backpressure and got the
idea but I would like to know more details. Let's consider

org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext


interface and its

void collect(T element);

method.


Is back pressure mechanism going to to block the calling collect method
thread for some time?

How does it compare what has been written in the mentioned article? I don't
quite understand how  '*The output side never puts too much data on the
wire by a simple watermark mechanism*' is supposed to work.

   - *Remote exchange*: If task 1 and task 2 run on different worker nodes,
   the buffer can be recycled as soon as it is on the wire (TCP channel). On
   the receiving side, the data is copied from the wire to a buffer from the
   input buffer pool. If no buffer is available, reading from the TCP
   connection is interrupted. The output side never puts too much data on the
   wire by a simple watermark mechanism. If enough data is in-flight, we wait
   before we copy more data to the wire until it is below a threshold. This
   guarantees that there is never too much data in-flight. If new data is not
   consumed on the receiving side (because there is no buffer available), this
   slows down the sender.

Thanks,
Pawel

回复：How back pressure works in Flink?

Posted by "Zhijiang(wangzhijiang999)" <wa...@aliyun.com>.

Hi Pawel,
The data transfer process on sender side is in the following way:operator collect record --> serilize to flink buffer --> copy to netty buffer --> flush to socket
On receiver side: socket --> netty --> flink buffer --> deserialize to record --> operator process
On receiver side, if the operator processes slowly, the limit flink buffer will be exhausted, then the netty thread can not request flink bufferand switch off the channel read on netty side temporaraily as a result. This will cause the socket data accumulated on receiver side and back pressure the sender by tcp mechanism.
On sender side, the socket will not send data to the receiver any more by tcp back pressure and is accumulated data gradually. We config the min and max watermark on netty side to limit in-flight data and netty buffers consumption. For example, if we define 2 flink buffers as max watermarkin netty, then the netty thread can only copy 2 flink buffers at most until they are already flushed to the socket. If the socket space is full caused bytcp back pressure from the receiver, the netty thread will not consume flink buffer any more after reaching the max watermark as a result. After all thelimit flink buffers are exhausted by collecting records, there are no available flink buffers any more, then the collect(T element) method you mentioned willbe blocked by requesting flink buffer.
All the whole processes seem a bit complicated and wish it can help you.
BTW, from FLINK-1.5 release, the network flow control is changed to classic credit-based mechanism. That means the sender transfers buffers only based onreceiver's announced available buffers and will not send extra data any more, so there are no in-flight data accumualted on the wire.

Zhijiang
------------------------------------------------------------------发件人：Pawel Bartoszek <pa...@gmail.com>发送时间：2018年3月7日(星期三) 07:09收件人：User <us...@flink.apache.org>主 题：How back pressure works in Flink?

Can you explain how back pressure affect the source in flink? I read the great article
https://data-artisans.com/blog/how-flink-handles-backpressure and got the idea but I would like to know more details. Let's consider
org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext interface and itsvoid collect(T element);method.

Is back pressure mechanism going to to block the calling collect method thread for some time?
How does it compare what has been written in the mentioned article? I don't quite understand how 'The output side never puts too much data on the wire by a simple watermark mechanism' is supposed to work.Remote exchange: If task 1 and task 2 run on different worker nodes, the buffer can be recycled as soon as it is on the wire (TCP channel). On the receiving side, the data is copied from the wire to a buffer from the input buffer pool. If no buffer is available, reading from the TCP connection is interrupted. The output side never puts too much data on the wire by a simple watermark mechanism. If enough data is in-flight, we wait before we copy more data to the wire until it is below a threshold. This guarantees that there is never too much data in-flight. If new data is not consumed on the receiving side (because there is no buffer available), this slows down the sender.Thanks,Pawel