You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (Jira)" <ji...@apache.org> on 2021/11/17 02:45:00 UTC

[jira] [Resolved] (SPARK-37341) Avoid unnecessary buffer and copy in full outer sort merge join

     [ https://issues.apache.org/jira/browse/SPARK-37341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenchen Fan resolved SPARK-37341.
---------------------------------
    Fix Version/s: 3.3.0
       Resolution: Fixed

Issue resolved by pull request 34612
[https://github.com/apache/spark/pull/34612]

> Avoid unnecessary buffer and copy in full outer sort merge join
> ---------------------------------------------------------------
>
>                 Key: SPARK-37341
>                 URL: https://issues.apache.org/jira/browse/SPARK-37341
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Cheng Su
>            Assignee: Cheng Su
>            Priority: Minor
>             Fix For: 3.3.0
>
>
> FULL OUTER sort merge join (non-code-gen path) copies join keys and buffers input rows, even when rows from both sides do have matched keys ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala#L1637-L1641] ). This is unnecessary, as we can just output the row with smaller join keys, and only buffer when both sides have matched keys. This would save us from unnecessary copy and buffer, when both join sides have a lot of rows not matched with each other.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org