You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Cheng Lian (JIRA)" <ji...@apache.org> on 2014/08/04 09:32:14 UTC

[jira] [Commented] (SPARK-2650) Wrong initial sizes for in-memory column buffers

    [ https://issues.apache.org/jira/browse/SPARK-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084397#comment-14084397 ] 

Cheng Lian commented on SPARK-2650:
-----------------------------------

Did some experiments and came to some conclusions:

# Needless to say, the {{10 * 1024 * 104}} is definitely a typo, but it's not related to the OOMs. More reasonable initial buffer sizes don't help solving these OOMs.
# The OOMs are also not related to whether the table size is larger than available memory. The cause is that the process of building in-memory columnar buffers is memory consuming, and multiple tasks building buffers in parallel eat too much memory altogether.
# According to 2, reducing parallelism or increasing executor memory can workaround this issue. For example, a {{HiveThriftServer2}} started with default executor memory (512MB) and {{--total-executor-cores=1}} could cache a 1.7GB table.
# Shark performs better than Spark SQL in this case, but still OOMs when the table gets larger: caching a 1.8GB table with default Shark configurations makes Shark OOM too.

I'm investigating why Spark SQL consumes more memory than Shark when building in-memory columnar buffers.

> Wrong initial sizes for in-memory column buffers
> ------------------------------------------------
>
>                 Key: SPARK-2650
>                 URL: https://issues.apache.org/jira/browse/SPARK-2650
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.0.0, 1.0.1
>            Reporter: Michael Armbrust
>            Assignee: Cheng Lian
>            Priority: Critical
>
> The logic for setting up the initial column buffers is different for Spark SQL compared to Shark and I'm seeing OOMs when caching tables that are larger than available memory (where shark was okay).
> Two suspicious things: the intialSize is always set to 0 so we always go with the default.  The default looks like it was copied from code like 10 * 1024 * 1024... but in Spark SQL its 10 * 102 * 1024.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org