You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2018/12/09 21:33:00 UTC

[jira] [Assigned] (SPARK-25401) Reorder the required ordering to match the table's output ordering for bucket join

     [ https://issues.apache.org/jira/browse/SPARK-25401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-25401:
------------------------------------

    Assignee:     (was: Apache Spark)

> Reorder the required ordering to match the table's output ordering for bucket join
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-25401
>                 URL: https://issues.apache.org/jira/browse/SPARK-25401
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Wang, Gang
>            Priority: Major
>
> Currently, we check if SortExec is needed between a operator and its child operator in method orderingSatisfies, and method orderingSatisfies require the order in the SortOrders are all the same.
> While, take the following case into consideration.
>  * Table a is bucketed by (a1, a2), sorted by (a2, a1), and buckets number is 200.
>  * Table b is bucketed by (b1, b2), sorted by (b2, b1), and buckets number is 200.
>  * Table a join table b on (a1=b1, a2=b2)
> In this case, if the join is sort merge join, the query planner won't add exchange on both sides, while, sort will be added on both sides. Actually, sort is also unnecessary, since in the same bucket, like bucket 1 of table a, and bucket 1 of table b, (a1=b1, a2=b2) is equivalent to (a2=b2, a1=b1).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org