You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:37:44 UTC

[jira] [Resolved] (SPARK-12358) Spark SQL query with lots of small tables under broadcast threshold leading to java.lang.OutOfMemoryError

     [ https://issues.apache.org/jira/browse/SPARK-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-12358.
----------------------------------
    Resolution: Incomplete

> Spark SQL query with lots of small tables under broadcast threshold leading to java.lang.OutOfMemoryError
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-12358
>                 URL: https://issues.apache.org/jira/browse/SPARK-12358
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.2
>            Reporter: Deenar Toraskar
>            Priority: Major
>              Labels: bulk-closed
>
> Hi
> I have a Spark SQL query with a lot of small tables (5x plus) all  below the broadcast threshold. Looking at the query plan Spark is broadcasting all these tables together without checking if there is sufficient memory available. This leads to 
> Exception in thread "broadcast-hash-join-2" java.lang.OutOfMemoryError: Java heap space 
> errors and causes the executors to die and query fail.
> I got around this issue by reducing the  spark.sql.autoBroadcastJoinThreshold to stop broadcasting the bigger tables in the query.
> A fix would be to 
> a) ensure that in addition to the per table threshold (spark.sql.autoBroadcastJoinThreshold), there is a total broadcast (say spark.sql.autoBroadcastJoinThresholdCumulative ) threshold per query, so only data up to that limit is broadcast preventing executors running out of memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org