You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "gaoyajun02 (Jira)" <ji...@apache.org> on 2022/01/25 07:29:00 UTC

[jira] [Commented] (SPARK-38010) Push-based shuffle disabled due to insufficient mergeLocations

    [ https://issues.apache.org/jira/browse/SPARK-38010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17481614#comment-17481614 ] 

gaoyajun02 commented on SPARK-38010:
------------------------------------

https://issues.apache.org/jira/browse/SPARK-34826 can solve it? [~vsowrirajan] 

> Push-based shuffle disabled due to insufficient mergeLocations
> --------------------------------------------------------------
>
>                 Key: SPARK-38010
>                 URL: https://issues.apache.org/jira/browse/SPARK-38010
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: Shuffle, Spark Core
>    Affects Versions: 3.1.0
>            Reporter: gaoyajun02
>            Priority: Major
>
> The current shuffle merger locations is obtained based on the host of the active or dead Executors.
> When dynamic executor allocation is enabled, when an application submits the first few stages, there are often not enough locations to satisfy the push merge, which causes most shuffles to not benefit from the push bashed shuffle.
> The first few shuffle write stages of spark applications are generally the stages for reading tables or data sources, which account for a large amount of shuffled data. Because push merge shuffle is disabled, the end-to-end improvement of spark applications is very limited.
> I probably thought of a way, but not sure if it's possible:
>  *  Lazy initialize shuffle merger locations, After the mapper writes the local shuffle data, it obtains the merge location in the push thread.
> Looking for advice and solutions on this issue



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org