You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Anton Kalashnikov (Jira)" <ji...@apache.org> on 2021/10/18 16:13:00 UTC

[jira] [Created] (FLINK-24578) Unexpected erratic load shape for channel skew load profile

Anton Kalashnikov created FLINK-24578:
-----------------------------------------

             Summary: Unexpected erratic load shape for channel skew load profile
                 Key: FLINK-24578
                 URL: https://issues.apache.org/jira/browse/FLINK-24578
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Checkpointing
    Affects Versions: 1.14.0
            Reporter: Anton Kalashnikov
         Attachments: antiphaseBufferSize.png, erraticBufferSize1.png, erraticBufferSize2.png

given:

The job with 5 maps(with keyBy).

All channels are remote. Parallelism is 80

The first task produces only two keys - `indexOfThisSubtask` and `indexOfThisSubtask + 1`. So every subTask has a constant value of active channels(depends on hash rebalance)

Every record has an equal size and is processed for an equal time.

 

when: 

The buffer debloat is enabled with the default configuration.

 

then:

The buffer size synchonizes on every subTask on the first map for some reason. It can have the strong synchronization as shown on the erraticBufferSize1 picture but usually synchronization is less explicit as on erraticBufferSize2.

!erraticBufferSize1.png!

 

Expected:

After the stabilization period the buffer size should be mostly constant with small fluctuation or the different tasks should be in antiphase to each other(when one subtask has small buffer size the another should have a big buffer size). for example the picture antiphaseBufferSize

!antiphaseBufferSize.png!

 

Unfortunatelly, it is not reproduced every time which means that this problem can be connected to environment. But at least, it makes sense to try to understand why we have so strange load shape when only several input channels are active.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)