You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Weijie Guo (Jira)" <ji...@apache.org> on 2022/08/11 10:08:00 UTC

[jira] [Created] (FLINK-28925) Fix the concurrency problem in hybrid shuffle

Weijie Guo created FLINK-28925:
----------------------------------

             Summary: Fix the concurrency problem in hybrid shuffle
                 Key: FLINK-28925
                 URL: https://issues.apache.org/jira/browse/FLINK-28925
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Network
    Affects Versions: 1.16.0
            Reporter: Weijie Guo
             Fix For: 1.16.0


Through tpc-ds testing and code analysis, I found some thread unsafe problems in hybrid shuffle:
 # HsSubpartitionMemeoryDataManager#consumeBuffer should return a readOnlySlice buffer to downstream instead of original buffer: If the spilling thread is processing while  downstream task is consuming the same buffer, the amount of data written to the disk will be smaller than the actual value. To solve this, we should let the consuming thread and the spilling thread share the same data but not index.
 # HsSubpartitionMemoryDataManager#releaseSubpartitionBuffers should ignore the release decision if the buffer already removed from bufferIndexToContexts instead of throw an exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)