You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "fanrui (Jira)" <ji...@apache.org> on 2022/04/18 04:02:00 UTC
[jira] [Comment Edited] (FLINK-26762) Add the overdraft buffer in BufferPool to reduce unaligned checkpoint being blocked

    [ https://issues.apache.org/jira/browse/FLINK-26762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523505#comment-17523505 ] 

fanrui edited comment on FLINK-26762 at 4/18/22 4:01 AM:
---------------------------------------------------------

Hi [~akalashnikov] , I have submitted the PR of the overdraft buffer, but there are a series of questions that need to be confirmed with you.
h2. About the configuration

You mentioned before:  it is better to configure overdraft-memory-size rather than overdraft-buffers. But I don't know the reason, could you give some details? 
Currently, I think the overdraft-buffers is better than overdraft-memory-size, because: 
 * Use numberOfRequiredMemorySegments in LocalBufferPool to count, so if flink users configure overdraft-memory-size, it may also need to convert to overdraft-buffers to implement the code. 
 * Also, I think overdraft-buffers is cleaner for the user. If the flink user knows the flatmap operator in the job:
 ** one input and 5 outputs, the user can configure overdraft-buffers = 5.
 ** one input and 10 outputs, the user can configure overdraft-buffers = 10.
 * If configuring memory, do users still need to convert?
 * When overdraft-buffers=numberOfSubpartitions, the buffer required for Watermark broadcast can also be resolved. So we may add a configuration in the future make overdraft-buffers=numberOfSubpartitions.

h2. About the benchmark

I have tested use a flink job with flatmap.

code link: [https://github.com/1996fanrui/fanrui-learning/commit/a1dbd850f878b64bfeb162b05f5c4750f9d629cc]

 
{code:java}
job parallelism = 100
sink sleep = 10ms
flatmap : one input 5 outputs
{code}
 

 

Without overdraft buffer, the checkpoint duration is between 3 and 7 minutes, mostly around 5 minutes. If flatmap outputs more data, the backpressure is more severe and the job parallelism is higher. The UC duration of jobs without overdraft may exceed 10 minutes.

With overdraft buffer, the checkpoint duration is between 0.3 and 2.5 s, the benefits are obvious.

!image-2022-04-18-11-45-14-700.png|width=1642,height=622!

!image-2022-04-18-11-46-03-895.png|width=1834,height=679!

I am now using two flink jobs for comparison. How to write a standard benchmark for comparison? How to get UC duration reasonably? 

 

Also, could you help review the PR in your free time? Thanks a lot.


was (Author: fanrui):
Hi [~akalashnikov] , I have submitted the PR of the overdraft buffer, but there are a series of questions that need to be confirmed with you.
h2. About the configuration
You mentioned before:  it is better to configure overdraft-memory-size rather than overdraft-buffers. But I don't know the reason, could you give some details? 
Currently, I think the overdraft-buffers is better than overdraft-memory-size, because: * Use numberOfRequiredMemorySegments in LocalBufferPool to count, so if flink users configure overdraft-memory-size, it may also need to convert to overdraft-buffers to implement the code. 
 * Also, I think overdraft-buffers is cleaner for the user. If the flink user knows the flatmap operator in the job:
 ** one input and 5 outputs, the user can configure overdraft-buffers = 5.
 ** one input and 10 outputs, the user can configure overdraft-buffers = 10.
 * If configuring memory, do users still need to convert?
 * When overdraft-buffers=numberOfSubpartitions, the buffer required for Watermark broadcast can also be resolved. So we may add a configuration let overdraft-buffers=numberOfSubpartitions in the future.

h2. About the benchmark

I have tested use a flink job with flatmap.

code link: [https://github.com/1996fanrui/fanrui-learning/commit/a1dbd850f878b64bfeb162b05f5c4750f9d629cc]

 
{code:java}
job parallelism = 100
sink sleep = 10ms
flatmap : one input 5 outputs
{code}
 

 

Without overdraft buffer, the checkpoint duration is between 3 and 7 minutes, mostly around 5 minutes. If flatmap outputs more data, the backpressure is more severe and the job parallelism is higher. The UC duration of jobs without overdraft may exceed 10 minutes.

With overdraft buffer, the checkpoint duration is between 0.3 and 2.5 s, the benefits are obvious.

!image-2022-04-18-11-45-14-700.png|width=1642,height=622!

!image-2022-04-18-11-46-03-895.png|width=1834,height=679!

I am now using two flink jobs for comparison. How to write a standard benchmark for comparison? How to get UC duration reasonably? 

 

Also, could you help review the PR in your free time? Thanks a lot.

> Add the overdraft buffer in BufferPool to reduce unaligned checkpoint being blocked
> -----------------------------------------------------------------------------------
>
>                 Key: FLINK-26762
>                 URL: https://issues.apache.org/jira/browse/FLINK-26762
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing, Runtime / Network
>    Affects Versions: 1.13.0, 1.14.0, 1.15.0
>            Reporter: fanrui
>            Assignee: fanrui
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.16.0
>
>         Attachments: image-2022-04-18-11-45-14-700.png, image-2022-04-18-11-46-03-895.png
>
>
> In some past JIRAs of Unaligned Checkpoint, the community has added the  recordWriter.isAvaliable() to reduce block for single record write. But for large record, flatmap or broadcast watermark, they may need more buffer.
> Can we add the overdraft buffer in BufferPool to reduce unaligned checkpoint being blocked? 
> h2. Overdraft Buffer mechanism
> Add the configuration of 'taskmanager.network.memory.overdraft-buffers-per-gate=5'. 
> When requestMemory is called and the bufferPool is insufficient, the bufferPool will allow the Task to overdraw up to 5 MemorySegments. And bufferPool will be unavailable until all overdrawn buffers are consumed by downstream tasks. Then the task will wait for bufferPool being available.
> From the above, we have the following benefits:
>  * For scenarios that require multiple buffers, the Task releases the Checkpoint lock, so the Unaligned Checkpoint can be completed quickly.
>  * We can control the memory usage to prevent memory leak.
>  * It just needs a litter memory, and can improve the stability of the Task under back pressure.
>  * Users can increase the overdraft-buffers to adapt the scenarios that require more buffers.
>  
> Masters, please correct me if I'm wrong, thanks a lot.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)