You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2021/03/13 01:40:00 UTC

[jira] [Assigned] (SPARK-34729) Faster execution for broadcast nested loop join (left semi/anti with no condition)

     [ https://issues.apache.org/jira/browse/SPARK-34729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-34729:
------------------------------------

    Assignee:     (was: Apache Spark)

> Faster execution for broadcast nested loop join (left semi/anti with no condition)
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-34729
>                 URL: https://issues.apache.org/jira/browse/SPARK-34729
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Cheng Su
>            Priority: Minor
>
> For `BroadcastNestedLoopJoinExec` left semi and left anti join without condition. If we broadcast left side. Currently we check whether every row from broadcast side has a match or not by iterating broadcast side a lot of time - [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala#L256-L275] . This is unnecessary, as there's no condition, and we only need to check whether stream side is empty or not. Create this Jira to add the optimization. This can boost the affected query execution performance a lot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org