You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:03:16 UTC

[jira] [Updated] (SPARK-22671) SortMergeJoin read more data when wholeStageCodegen is off compared with when it is on

     [ https://issues.apache.org/jira/browse/SPARK-22671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-22671:
---------------------------------
    Labels: bulk-closed  (was: )

> SortMergeJoin read more data when wholeStageCodegen is off compared with when it is on
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-22671
>                 URL: https://issues.apache.org/jira/browse/SPARK-22671
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Chenzhao Guo
>            Priority: Major
>              Labels: bulk-closed
>
> In SortMergeJoin(with wholeStageCodegen), an optimization already exists: if the left table of a partition is empty then there is no need to read the right table of this corresponding partition. This benefits the case in which many partitions of left table is empty and the right table is big.
> While in the code path without wholeStageCodegen, this optimization doesn't happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org