You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2019/06/09 23:37:00 UTC

[jira] [Updated] (SPARK-26305) Breakthrough the memory limitation of broadcast join

     [ https://issues.apache.org/jira/browse/SPARK-26305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun updated SPARK-26305:
----------------------------------
    Affects Version/s:     (was: 2.4.0)
                       3.0.0

> Breakthrough the memory limitation of broadcast join
> ----------------------------------------------------
>
>                 Key: SPARK-26305
>                 URL: https://issues.apache.org/jira/browse/SPARK-26305
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Lantao Jin
>            Priority: Major
>
> If the join between a big table and a small one faces data skewing issue, we usually use a broadcast hint in SQL to resolve it. However, current broadcast join has many limitations. The primary restriction is memory. The small table which is broadcasted must be fulfilled to memory in driver/executors side. Although it will spill to disk when the memory is insufficient, it still causes OOM if the small table actually is not absolutely small, it's relatively small. In our company, we have many real big data SQL analysis jobs which handle dozens of hundreds terabytes join and shuffle. For example, the size of large table is 100TB, and the small one is 10000 times less, still 10GB. In this case, broadcast join couldn't be finished since the small one is still larger than expected. If the join is data skewing, the sortmerge join always failed.
> Hive has a skew join hint which could trigger two-stage task to handle the skew key and normal key separately. I guess Databricks Runtime has the similar implementation. However, the skew join hint needs SQL users know the data in table like their children. They must know which key is skewing in a join. It's very hard to know since the data is changing day by day and the join key isn't fixed in different queries. The users have to set a huge partition number to try their luck.
> So, do we have a simple, rude and efficient way to resolve it? Back to the limitation, if the broadcasted table no needs to fill to memory, in other words, driver/executor stores the broadcasted table to disk only. The problem mentioned above could be resolved.
> A new hint like BROADCAST_DISK or an additional parameter in original BROADCAST hint will be introduced to cover this case. The original broadcast behavior won’t be changed.
> I will offer a design doc if you have same feeling about it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org