You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2021/09/20 17:03:00 UTC

[jira] [Assigned] (SPARK-36809) Remove broadcast for InSubqueryExec used in DPP

     [ https://issues.apache.org/jira/browse/SPARK-36809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-36809:
------------------------------------

    Assignee: Apache Spark

> Remove broadcast for InSubqueryExec used in DPP
> -----------------------------------------------
>
>                 Key: SPARK-36809
>                 URL: https://issues.apache.org/jira/browse/SPARK-36809
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: L. C. Hsieh
>            Assignee: Apache Spark
>            Priority: Major
>
> Currently we include a broadcast variable in InSubqueryExec. We use it to hold filtering side query result of DPP. It looks weird because we don't use the result in executors but only need the result in the driver during query planning. We already hold the original result, so basically we hold two copied of query result at this moment.
> Another thing related is, in pruningHasBenefit we estimate if DPP pruning has benefit when the join type does not support broadcast. Due to the broadcast variable above, we also check the filtering side against the config autoBroadcastJoinThreshold. The config is not for the purpose and it is not a broadcast join. As the broadcast variable is unnecessary, we can remove this check and leave benefit estimation to overhead and pruning size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org