You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "L. C. Hsieh (Jira)" <ji...@apache.org> on 2021/09/20 17:01:00 UTC

[jira] [Created] (SPARK-36809) Remove broadcast for InSubqueryExec used in DPP

L. C. Hsieh created SPARK-36809:
-----------------------------------

             Summary: Remove broadcast for InSubqueryExec used in DPP
                 Key: SPARK-36809
                 URL: https://issues.apache.org/jira/browse/SPARK-36809
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.3.0
            Reporter: L. C. Hsieh


Currently we include a broadcast variable in InSubqueryExec. We use it to hold filtering side query result of DPP. It looks weird because we don't use the result in executors but only need the result in the driver during query planning. We already hold the original result, so basically we hold two copied of query result at this moment.

Another thing related is, in pruningHasBenefit we estimate if DPP pruning has benefit when the join type does not support broadcast. Due to the broadcast variable above, we also check the filtering side against the config autoBroadcastJoinThreshold. The config is not for the purpose and it is not a broadcast join. As the broadcast variable is unnecessary, we can remove this check and leave benefit estimation to overhead and pruning size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org