You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Piotr Nowojski (Jira)" <ji...@apache.org> on 2022/04/13 08:31:00 UTC

[jira] [Commented] (FLINK-24578) Unexpected erratic load shape for channel skew load profile and ~10% performance loss with enabled debloating

    [ https://issues.apache.org/jira/browse/FLINK-24578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521540#comment-17521540 ] 

Piotr Nowojski commented on FLINK-24578:
----------------------------------------

As a next step in this ticket it might be a good idea to double check, if the same performance regression as from enabling the debloating is visible after manually decreasing the buffer size to a value similar as the debloated one for the given job. 

> Unexpected erratic load shape for channel skew load profile and ~10% performance loss with enabled debloating
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-24578
>                 URL: https://issues.apache.org/jira/browse/FLINK-24578
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.14.0
>            Reporter: Anton Kalashnikov
>            Priority: Major
>         Attachments: antiphaseBufferSize.png, erraticBufferSize1.png, erraticBufferSize2.png
>
>
> given:
> The job with 5 maps(with keyBy).
> All channels are remote. Parallelism is 80
> The first task produces only two keys - `indexOfThisSubtask` and `indexOfThisSubtask + 1`. So every subTask has a constant value of active channels(depends on hash rebalance)
> Every record has an equal size and is processed for an equal time.
>  
> when: 
> The buffer debloat is enabled with the default configuration.
>  
> then:
> The buffer size synchonizes on every subTask on the first map for some reason. It can have the strong synchronization as shown on the erraticBufferSize1 picture but usually synchronization is less explicit as on erraticBufferSize2.
> !erraticBufferSize1.png!
> !erraticBufferSize2.png!  
>  
> Expected:
> After the stabilization period the buffer size should be mostly constant with small fluctuation or the different tasks should be in antiphase to each other(when one subtask has small buffer size the another should have a big buffer size). for example the picture antiphaseBufferSize
> !antiphaseBufferSize.png!
>  
> Unfortunatelly, it is not reproduced every time which means that this problem can be connected to environment. But at least, it makes sense to try to understand why we have so strange load shape when only several input channels are active.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)