You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (JIRA)" <ji...@apache.org> on 2017/03/21 02:44:41 UTC

[jira] [Resolved] (SPARK-20010) Sort information is lost after sort merge join

     [ https://issues.apache.org/jira/browse/SPARK-20010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenchen Fan resolved SPARK-20010.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 2.2.0

Issue resolved by pull request 17339
[https://github.com/apache/spark/pull/17339]

> Sort information is lost after sort merge join
> ----------------------------------------------
>
>                 Key: SPARK-20010
>                 URL: https://issues.apache.org/jira/browse/SPARK-20010
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Zhenhua Wang
>             Fix For: 2.2.0
>
>
> After sort merge join for inner join, now we only keep left key ordering. However, after inner join, right key has the same value and order as left key. So if we need another smj on right key, we will unnecessarily add a sort which causes additional cost.
> As a more complicated example, A join B on A.key = B.key join C on B.key = C.key join D on A.key = D.key. We will unnecessarily add a sort on B.key when join \{A, B\} and C, and add a sort on A.key when join \{A, B, C\} and D.
> To fix this, we need to propagate all sorted information (equivalent expressions) from bottom up through `outputOrdering` and `SortOrder`.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org