You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Cheng Lian (JIRA)" <ji...@apache.org> on 2014/08/04 21:14:12 UTC
[jira] [Comment Edited] (SPARK-2650) Wrong initial sizes for in-memory column buffers

    [ https://issues.apache.org/jira/browse/SPARK-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085104#comment-14085104 ] 

Cheng Lian edited comment on SPARK-2650 at 8/4/14 7:12 PM:
-----------------------------------------------------------

Some additional comments after more experiments and some improvements:

# How exactly the OOMs occur when caching a large table (assume N cores and M memory are available within a single executor):
#- Say the table is so large that the underlying RDD is divided into X partitions (usually X >> N, let's assume this here)
#- When caching the table, N tasks are executed in parallel, building column buffers, each of them is memory consuming. Say, each task consumes Y memory in average.
#- At some point, memory consumptions of all N parallel tasks altogether, namely N * Y exceeds M, and an OOM is thrown
#- All tasks fail and retry, but fail again, until the driver stops retrying, and the job fail
#- I guess the reason that this issue hasn't been reported is that, usually N * Y < holds in production.
# Initial buffer sizes do affect the OOMs in a subtle way:
#- Too large an initial buffer size implies, apparently, larger memory consumption
#- Too small an initial buffer size causes the {{ColumnBuilder}} keeps allocating larger buffers to ensure enough free space to hold more elements (12.5% larger at a time). Thus 212.5% larger space is required to finish growing a buffer (112.5% for the new buffer + 100% for the original one).
#- A well estimated initial buffer size should be 1) large enough to avoid buffer growing, and 2) small enough to avoid memory waste. For example, by hand tuning, 5MB can be a good initial size for an executor with 512M memory and 1 core.
# An apparent approach to help fixing this issue is to try reducing the memory consumption during the column building process.
#- [PR #1769|https://github.com/apache/spark/pull/1769] is submitted to reduce memory consumption of the column building proces.
# Another approach is to estimate the initial buffer size. To do this, Shark uses an estimated table partition size by leveraging HDFS block size and column element default size. We can use similar approach in Spark SQL for Hive tables, and some configurable initial size for non-Hive tables.
#- Currently {{InMemoryRelation}} resides in package {{org.apache.spark.sql.columnar}} and doesn't know anything about Hive tables. We can add an {{estimatedPartitionSize}} method, and override it in a new {{InMemoryMetastoreRelation}} to estimate RDD partition sizes of a Hive table. This will be done in another separate PR.


was (Author: lian cheng):
Some additional comments after more experiments and some improvements:

# How exactly the OOMs occur when caching a large table (assume N cores and M memory are available within a single executor):
#- Say the table is so large that the underlying RDD is divided into X partitions (usually X >> N, let's assume this here)
#- When caching the table, N tasks are executed in parallel, building column buffers, each of them is memory consuming. Say, each task consumes Y memory in average.
#- At some point, memory consumptions of all N parallel tasks altogether, namely N * Y exceeds available memory of the executor, an OOM is thrown
#- All tasks fail and retry, but fail again, until the driver stops retrying, and the job fail
#- I guess the reason that this issue hasn't been reported is that, usually M > N * Y holds in production.
# Initial buffer sizes do affect the OOMs in a subtle way:
#- Too large an initial buffer size implies, apparently, larger memory consumption
#- Too small an initial buffer size causes the {{ColumnBuilder}} keeps allocating larger buffers to ensure enough free space to hold more elements (12.5% larger at a time). Thus 212.5% larger space is required to finish growing a buffer (112.5% for the new buffer + 100% for the original one).
#- A well estimated initial buffer size should be 1) large enough to avoid buffer growing, and 2) small enough to avoid memory waste. For example, by hand tuning, 5MB can be a good initial size for an executor with 512M memory and 1 core.
# An apparent approach to help fixing this issue is to try reducing the memory consumption during the column building process.
#- [PR #1769|https://github.com/apache/spark/pull/1769] is submitted to reduce memory consumption of the column building proces.
# Another approach is to estimate the initial buffer size. To do this, Shark uses an estimated table partition size by leveraging HDFS block size and column element default size. We can use similar approach in Spark SQL for Hive tables, and some configurable initial size for non-Hive tables.
#- Currently {{InMemoryRelation}} resides in package {{org.apache.spark.sql.columnar}} and doesn't know anything about Hive tables. We can add an {{estimatedPartitionSize}} method, and override it in a new {{InMemoryMetastoreRelation}} to estimate RDD partition sizes of a Hive table. This will be done in another separate PR.

> Wrong initial sizes for in-memory column buffers
> ------------------------------------------------
>
>                 Key: SPARK-2650
>                 URL: https://issues.apache.org/jira/browse/SPARK-2650
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.0.0, 1.0.1
>            Reporter: Michael Armbrust
>            Assignee: Cheng Lian
>            Priority: Critical
>
> The logic for setting up the initial column buffers is different for Spark SQL compared to Shark and I'm seeing OOMs when caching tables that are larger than available memory (where shark was okay).
> Two suspicious things: the intialSize is always set to 0 so we always go with the default.  The default looks like it was copied from code like 10 * 1024 * 1024... but in Spark SQL its 10 * 102 * 1024.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org