You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Ke Jia (JIRA)" <ji...@apache.org> on 2019/06/14 02:35:00 UTC

[jira] [Created] (SPARK-28046) OOM caused by building hash table when the compressed ratio of small table is normal

Ke Jia created SPARK-28046:
------------------------------

             Summary: OOM caused by building hash table when the compressed ratio of small table is normal
                 Key: SPARK-28046
                 URL: https://issues.apache.org/jira/browse/SPARK-28046
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.4.1
            Reporter: Ke Jia


Currently, spark will convert the sort merge join to broadcast hash join when the small table compressed  size <= the broadcast threshold.  Same with Spark, AE also convert the smj to bhj based on the compressed size in runtime.  In our test, when enable ae with 32M broadcast threshold, one smj with 16M compressed size is converted to bhj. However, when building the hash table, the 16M small table is decompressed with 2GB size and has 134485048 row count, which has a mount of continuous and repeated values. Therefore, the following OOM exception occurs when building hash table:

!image-2019-06-14-10-29-00-499.png!

And based on this founding , it may be not reasonable to decide whether smj be converted to bhj only by the compressed size. In AE, we add the condition with the estimation  decompressed size based on the row counts. And in spark, we may also need the decompressed size or row counts condition judgment not only the compressed size when converting the smj to bhj.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org