You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wang, Gang (JIRA)" <ji...@apache.org> on 2018/09/11 06:16:00 UTC

[jira] [Created] (SPARK-25401) Reorder the required ordering to match the table's output ordering for bucket join

Wang, Gang created SPARK-25401:
----------------------------------

             Summary: Reorder the required ordering to match the table's output ordering for bucket join
                 Key: SPARK-25401
                 URL: https://issues.apache.org/jira/browse/SPARK-25401
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: Wang, Gang


Currently, we check if SortExec is needed between a operator and its child operator in method orderingSatisfies, and method orderingSatisfies require the order in the SortOrders are all the same.

While, take the following case into consideration.
 * Table a is bucketed by (a1, a2), sorted by (a2, a1), and buckets number is 200.
 * Table b is bucketed by (b1, b2), sorted by (b2, b1), and buckets number is 200.
 * Table a join table b on (a1=b1, a2=b2)

In this case, if the join is sort merge join, the query planner won't add exchange on both sides, while, sort will be added on both sides. Actually, sort is also unnecessary, since in the same bucket, like bucket 1 of table a, and bucket 1 of table b, (a1=b1, a2=b2) is equivalent to (a2=b2, a1=b1).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org