You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Mridul Muralidharan (Jira)" <ji...@apache.org> on 2021/10/04 17:26:00 UTC

[jira] [Commented] (SPARK-36892) Disable batch fetch for a shuffle when push based shuffle is enabled

    [ https://issues.apache.org/jira/browse/SPARK-36892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424052#comment-17424052 ] 

Mridul Muralidharan commented on SPARK-36892:
---------------------------------------------

[~Gengliang.Wang] Thanks for holding the release !
The team worked on testing the RC and fixed two issues:
* SPARK-36705 (https://github.com/apache/spark/pull/34158) and
* SPARK-36892 (https://github.com/apache/spark/pull/34156).

With these two fixes on top of RC6, in addition to internal tests, we had all queries for TPCDS (scale 100) completing successfully and correctly.
There was an issue identified with mismatch for TPCH query foo - [~apatnam] has filed a jira (SPARK-36926) for it.


> Disable batch fetch for a shuffle when push based shuffle is enabled
> --------------------------------------------------------------------
>
>                 Key: SPARK-36892
>                 URL: https://issues.apache.org/jira/browse/SPARK-36892
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle
>    Affects Versions: 3.2.0
>            Reporter: Mridul Muralidharan
>            Priority: Blocker
>
> When push based shuffle is enabled, efficient fetch of merged mapper shuffle output happens.
> Unfortunately, this currently interacts badly with spark.sql.adaptive.fetchShuffleBlocksInBatch, potentially causing shuffle fetch to hang and/or duplicate data to be fetched, causing correctness issues.
> Given batch fetch does not benefit spark stages reading merged blocks when push based shuffle is enabled, ShuffleBlockFetcherIterator.doBatchFetch can be disabled when push based shuffle is enabled.
> Thx to [~Ngone51] for surfacing this issue.
> +CC [~Gengliang.Wang]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org