You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2015/05/06 22:06:00 UTC

[jira] [Assigned] (SPARK-7375) Avoid defensive copying in SQL exchange operator when sort-based shuffle buffers data in serialized form

     [ https://issues.apache.org/jira/browse/SPARK-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-7375:
-----------------------------------

    Assignee:     (was: Apache Spark)

> Avoid defensive copying in SQL exchange operator when sort-based shuffle buffers data in serialized form
> --------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-7375
>                 URL: https://issues.apache.org/jira/browse/SPARK-7375
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Josh Rosen
>
> The original sort-based shuffle buffers shuffle input records in memory while sorting them. This causes problems when mutable records are presented to the shuffle, which happens in Spark SQL's Exchange operator. To work around this issue, SPARK-2967 and SPARK-4479 added defensive copying of shuffle inputs in the Exchange operator when sort-based shuffle is enabled.
> I think that [~sandyr]'s recent patch for enabling serialization of records in sort-based shuffle (SPARK-4550) and my proposed {{unsafe}}-based shuffle path (SPARK-7081) may allow us to avoid this defensive copying in certain cases (since our patches cause records to be serialized one-at-a-time and remove the buffering of deserialized records).
> As mentioned in SPARK-4479, a long-term fix for this issue might be to add hooks for informing the shuffle about object (im)mutability in order to allow the shuffle layer to decide whether to copy. In the meantime, though, I think that we should just extend the checks added in SPARK-4479 to avoid copies when these new serialized sort paths are used.
> /cc [~rxin] [~marmbrus] [~yhuai]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org