You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Yingjie Cao (Jira)" <ji...@apache.org> on 2023/03/10 14:39:00 UTC

[jira] [Closed] (FLINK-31386) Fix the potential deadlock issue of blocking shuffle

     [ https://issues.apache.org/jira/browse/FLINK-31386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yingjie Cao closed FLINK-31386.
-------------------------------
    Resolution: Fixed

> Fix the potential deadlock issue of blocking shuffle
> ----------------------------------------------------
>
>                 Key: FLINK-31386
>                 URL: https://issues.apache.org/jira/browse/FLINK-31386
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network
>            Reporter: Yingjie Cao
>            Assignee: Yingjie Cao
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.17.0
>
>
> Currently, the SortMergeResultPartition may allocate more network buffers than the guaranteed size of the LocalBufferPool. As a result, some result partitions may need to wait other result partitions to release the over-allocated network buffers to continue. However, the result partitions which have allocated more than guaranteed buffers relies on the processing of input data to trigger data spilling and buffer recycling. The input data further relies on batch reading buffers used by the SortMergeResultPartitionReadScheduler which may already taken by those blocked result partitions which are waiting for buffers. Then deadlock occurs. We can easily fix this deadlock by reserving the guaranteed buffers on initializing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)