You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:18:19 UTC

[jira] [Resolved] (SPARK-20006) Separate threshold for broadcast and shuffled hash join

     [ https://issues.apache.org/jira/browse/SPARK-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-20006.
----------------------------------
    Resolution: Incomplete

> Separate threshold for broadcast and shuffled hash join
> -------------------------------------------------------
>
>                 Key: SPARK-20006
>                 URL: https://issues.apache.org/jira/browse/SPARK-20006
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Zhan Zhang
>            Priority: Minor
>              Labels: bulk-closed
>
> Currently both canBroadcast and canBuildLocalHashMap use the same configuration: AUTO_BROADCASTJOIN_THRESHOLD. 
> But the memory model may be different. For broadcast, currently the hash map is always build on heap. For shuffledHashJoin, the hash map may be build on heap(longHash), or off heap(other map if off heap is enabled). The same configuration makes the configuration hard to tune (how to allocate memory onheap/offheap). Propose to use different configuration. Please comments whether it is reasonable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org