You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (Jira)" <ji...@apache.org> on 2020/08/11 06:25:00 UTC
[jira] [Assigned] (SPARK-32573) Anti Join Improvement with
EmptyHashedRelation and EmptyHashedRelationWithAllNullKeys
[ https://issues.apache.org/jira/browse/SPARK-32573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reassigned SPARK-32573:
-----------------------------------
Assignee: Leanken.Lin
> Anti Join Improvement with EmptyHashedRelation and EmptyHashedRelationWithAllNullKeys
> -------------------------------------------------------------------------------------
>
> Key: SPARK-32573
> URL: https://issues.apache.org/jira/browse/SPARK-32573
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Leanken.Lin
> Assignee: Leanken.Lin
> Priority: Minor
>
> In SPARK-32290, we introduced several new types of HashedRelation
> * EmptyHashedRelation
> * EmptyHashedRelationWithAllNullKeys
> They were all limited to used only in NAAJ scenario. These new HashedRelation could be applied to other scenario for performance improvements.
> * EmptyHashedRelation could also be used in Normal AntiJoin for fast stop
> * While AQE is on and buildSide is EmptyHashedRelationWithAllNullKeys, can convert NAAJ to a Empty LocalRelation to skip meaningless data iteration since in Single-Key NAAJ, if null key exists in BuildSide, will drop all records in streamedSide.
> This Patch including two changes.
> * using EmptyHashedRelation to do fast stop for common anti join as well
> * In AQE, eliminate BroadcastHashJoin(NAAJ) if buildSide is a EmptyHashedRelationWithAllNullKeys
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org