You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Wenchen Fan (JIRA)" <ji...@apache.org> on 2018/11/13 16:28:00 UTC

[jira] [Resolved] (SPARK-25947) Reduce memory usage in ShuffleExchangeExec by selecting only the sort columns

     [ https://issues.apache.org/jira/browse/SPARK-25947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenchen Fan resolved SPARK-25947.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0

Issue resolved by pull request 22961
[https://github.com/apache/spark/pull/22961]

> Reduce memory usage in ShuffleExchangeExec by selecting only the sort columns
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-25947
>                 URL: https://issues.apache.org/jira/browse/SPARK-25947
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.2
>            Reporter: Shuheng Dai
>            Priority: Major
>             Fix For: 3.0.0
>
>
> When sorting rows, ShuffleExchangeExec uses the entire row instead of just the columns references in SortOrder to create the RangePartitioner. This causes the RangePartitioner to sample entire rows to create rangeBounds and can cause OOM issues on the driver when rows contain large fields.
> Create a projection and only use columns involved in the SortOrder for the RangePartitioner



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org