You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2015/08/06 22:27:04 UTC

[jira] [Created] (SPARK-9703) EnsureRequirements should not add unnecessary shuffles when only ordering requirements are unsatisfied

Josh Rosen created SPARK-9703:
---------------------------------

             Summary: EnsureRequirements should not add unnecessary shuffles when only ordering requirements are unsatisfied
                 Key: SPARK-9703
                 URL: https://issues.apache.org/jira/browse/SPARK-9703
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 1.4.0, 1.3.0, 1.5.0
            Reporter: Josh Rosen
            Assignee: Josh Rosen


Consider SortMergeJoin, which requires a sorted, clustered distribution of its input rows. Say that both of SMJ's children produce unsorted output but are both single partition. In this case, we will need to inject sort operators but should not need to inject exchanges. Unfortunately, it looks like the Exchange unnecessarily repartitions using a hash partitioning.

We should update Exchange so that it does not unnecessarily repartition children when only the ordering requirements are unsatisfied.

I'd like to fix this for Spark 1.5 since it makes certain types of unit tests easier to write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org