You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2021/11/16 00:13:00 UTC

[jira] [Assigned] (SPARK-37341) Avoid unnecessary buffer and copy in full outer sort merge join

     [ https://issues.apache.org/jira/browse/SPARK-37341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-37341:
------------------------------------

    Assignee:     (was: Apache Spark)

> Avoid unnecessary buffer and copy in full outer sort merge join
> ---------------------------------------------------------------
>
>                 Key: SPARK-37341
>                 URL: https://issues.apache.org/jira/browse/SPARK-37341
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Cheng Su
>            Priority: Minor
>
> FULL OUTER sort merge join (non-code-gen path) copies join keys and buffers input rows, even when rows from both sides do have matched keys ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala#L1637-L1641] ). This is unnecessary, as we can just output the row with smaller join keys, and only buffer when both sides have matched keys. This would save us from unnecessary copy and buffer, when both join sides have a lot of rows not matched with each other.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org