You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wang, Gang (JIRA)" <ji...@apache.org> on 2018/09/11 06:16:00 UTC
[jira] [Created] (SPARK-25401) Reorder the required ordering to
match the table's output ordering for bucket join
Wang, Gang created SPARK-25401:
----------------------------------
Summary: Reorder the required ordering to match the table's output ordering for bucket join
Key: SPARK-25401
URL: https://issues.apache.org/jira/browse/SPARK-25401
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 2.3.0
Reporter: Wang, Gang
Currently, we check if SortExec is needed between a operator and its child operator in method orderingSatisfies, and method orderingSatisfies require the order in the SortOrders are all the same.
While, take the following case into consideration.
* Table a is bucketed by (a1, a2), sorted by (a2, a1), and buckets number is 200.
* Table b is bucketed by (b1, b2), sorted by (b2, b1), and buckets number is 200.
* Table a join table b on (a1=b1, a2=b2)
In this case, if the join is sort merge join, the query planner won't add exchange on both sides, while, sort will be added on both sides. Actually, sort is also unnecessary, since in the same bucket, like bucket 1 of table a, and bucket 1 of table b, (a1=b1, a2=b2) is equivalent to (a2=b2, a1=b1).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org