You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Min Shen (Jira)" <ji...@apache.org> on 2021/08/16 17:08:00 UTC

[jira] [Commented] (SPARK-35036) Improve push based shuffle to work with AQE by fetching partial map indexes for a reduce partition

    [ https://issues.apache.org/jira/browse/SPARK-35036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17399843#comment-17399843 ] 

Min Shen commented on SPARK-35036:
----------------------------------

Not sure if this is fixable, given the reasons you already described.

The partial set of map indexes are used in AQE only to handle skewed partitions.

Since it's a skewed partition to begin with, in practice it would only affect very few shuffle partitions.

We could alternatively handle skewed partitions with push-based shuffle differently from how AQE handles it, i.e. instead of subdividing a shuffle partition using continuous map index sub-ranges we could subdivide a skewed merged shuffle partition based on boundaries of the MB-sized chunks.

This should be relatively easier to achieve and can also handle skewed partitions.

Furthermore, just to clarify that push-based shuffle can already work with AQE for shuffle partition coalescing.

> Improve push based shuffle to work with AQE by fetching partial map indexes for a reduce partition
> --------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-35036
>                 URL: https://issues.apache.org/jira/browse/SPARK-35036
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Spark Core
>    Affects Versions: 3.1.1
>            Reporter: Venkata krishnan Sowrirajan
>            Priority: Major
>
> Currently when both Push based shuffle and AQE is enabled and when partial set of map indexes are requested to MapOutputTracker this is delegated the regular shuffle instead of push based shuffle reading map blocks. This is because blocks from mapper in push based shuffle are merged out of order due to which its hard to only get the matching blocks of the reduce partition for the requested start and end map indexes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org