You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Patrick Woody (JIRA)" <ji...@apache.org> on 2016/04/30 17:26:12 UTC

[jira] [Created] (SPARK-15038) Add ability to do broadcasts in SQL at execution time

Patrick Woody created SPARK-15038:
-------------------------------------

             Summary: Add ability to do broadcasts in SQL at execution time
                 Key: SPARK-15038
                 URL: https://issues.apache.org/jira/browse/SPARK-15038
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 1.6.1
            Reporter: Patrick Woody


Currently the auto broadcasting done in SparkSQL is asynchronous and done at query planning time. If you have a large query with many broadcasts, this can end up creating a large amount of memory pressure/possible OOMs all at once when it actually isn't necessary.

The current workaround for these types of queries is to disable broadcast joins, which can be prohibitive performance wise. The proposal for this ticket is to allow a config point to toggle doing these broadcasts either eagerly/asynchronously or doing the broadcasts lazily at execution time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org