You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Weijie Guo (Jira)" <ji...@apache.org> on 2024/01/02 06:39:00 UTC

[jira] [Comment Edited] (FLINK-33961) Hybrid Shuffle may hang when exclusive buffers per channel is set to 0

    [ https://issues.apache.org/jira/browse/FLINK-33961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801669#comment-17801669 ] 

Weijie Guo edited comment on FLINK-33961 at 1/2/24 6:38 AM:
------------------------------------------------------------

[~Jiang Xin] Thanks for reporting this. Unfortunately, this issue can be considered somewhat by design. In order to avoid additional overhead, we allow the upstream not to calculate the exact backlog, which would result in {{exclusive-buffers-per-channel}} not being set to 0. It can greatly affect performance and may even block the job.

Therefore, we have the following instructions in the documentation: When the legacy Hybrid shuffle mode is used, decreasing the number of exclusive buffers per channel will seriously affect the performance. Therefore, this value should not be set to 0. 

That's one of the reasons we introduced the new hybrid shuffle mode(i.e. TieredStorage Shuffle). If there are no further questions, I will close the issue then.


was (Author: weijie guo):
Thanks for reporting this. Unfortunately, this issue can be considered somewhat by design. In order to avoid additional overhead, we allow the upstream not to calculate the exact backlog, which would result in {{exclusive-buffers-per-channel}} not being set to 0. It can greatly affect performance and may even block the job.

Therefore, we have the following instructions in the documentation: When the legacy Hybrid shuffle mode is used, decreasing the number of exclusive buffers per channel will seriously affect the performance. Therefore, this value should not be set to 0. 

That's one of the reasons we introduced the new hybrid shuffle mode(i.e. TieredStorage Shuffle). If there are no further questions, I will close the issue then.

> Hybrid Shuffle may hang when exclusive buffers per channel is set to 0
> ----------------------------------------------------------------------
>
>                 Key: FLINK-33961
>                 URL: https://issues.apache.org/jira/browse/FLINK-33961
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network
>            Reporter: Jiang Xin
>            Priority: Major
>
> I found that the Hybrid Shuffle without enabling new mode may hang when exclusive-buffers-per-channel is set to 0. It can be reproduced by adding the following test into `HybridShuffleITCase.java` and running it.
> {code:java}
> @RepeatedTest(10)
> void testHybridFullExchangesWithNonBuffersPerChannel() throws Exception {
>     final int numRecordsToSend = 10000;
>     Configuration configuration = configureHybridOptions(getConfiguration(), false);
>     configuration.set(
>             NettyShuffleEnvironmentOptions.NETWORK_HYBRID_SHUFFLE_ENABLE_NEW_MODE, false);
>     configuration.set(NETWORK_BUFFERS_PER_CHANNEL, 0);
>     JobGraph jobGraph = createJobGraph(numRecordsToSend, false, configuration);
>     executeJob(jobGraph, configuration, numRecordsToSend);
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)