You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Mars (Jira)" <ji...@apache.org> on 2022/11/17 11:50:00 UTC

[jira] [Commented] (SPARK-37313) Child stage using merged output or not should be based on the availability of merged output from parent stage

    [ https://issues.apache.org/jira/browse/SPARK-37313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635314#comment-17635314 ] 

Mars commented on SPARK-37313:
------------------------------

as comment said [https://github.com/apache/spark/pull/34461#issuecomment-964557253]
I'm working on this Issue and trying to implement this functionality [~minyang] [~mridul] 

> Child stage using merged output or not should be based on the availability of merged output from parent stage
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-37313
>                 URL: https://issues.apache.org/jira/browse/SPARK-37313
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Shuffle, Spark Core
>    Affects Versions: 3.2.1
>            Reporter: Minchu Yang
>            Priority: Minor
>
> As discussed in the [thread|https://github.com/apache/spark/pull/34461#pullrequestreview-799701494] in SPARK-37023, during a stage retry, if parent stage has already generated merged output in the previous attempt, with current behavior, the child stage would not able to fetch the merged output, as this is controlled by dependency.shuffleMergeEnabled (see current implementation [here|https://github.com/apache/spark/blob/31b6f614d3173c8a5852243bf7d0b6200788432d/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala#L134-L136]) during the stage retry.
> Instead of using a single variable to control behavior at both mapper side (push side) and reducer side (using merged output), whether child stage uses merged output or not must only be based on whether merged output is available for it to use(as discussed [here|https://github.com/apache/spark/pull/34461#issuecomment-964557253]).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org