You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "liupengcheng (Jira)" <ji...@apache.org> on 2020/03/20 09:55:00 UTC
[jira] [Created] (SPARK-31202) Improve SizeEstimator for
AppendOnlyMap
liupengcheng created SPARK-31202:
------------------------------------
Summary: Improve SizeEstimator for AppendOnlyMap
Key: SPARK-31202
URL: https://issues.apache.org/jira/browse/SPARK-31202
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 2.3.2, 3.0.0
Reporter: liupengcheng
Currently, spark's memory management depends on the size estimation for execution and storage.
In our real cluster, users always meet the issue OOM due to the inaccurate size estimation for ` AppendOnlyMap`, that's because spark stores KV in an Array[AnyRef] in `AppendOnlyMap` for memory locality, and this value can be CompactBuffer[_] or Array[CompactBuffer[_]] for transformation like cogroup/join/groupBy, but current `SizeEstimator` will still treat this special array as an normal array, so in many cases, we noticed a great bias between the estimated size and the acutal memory consuption.
So we improved this in xiaomi:
1. Improve the estimation for AppendOnlyMap when the value type is CompactBuffer
2. Respect jvm gc stats to decide whether to spilling when doing sort/agg
In this jira, I propose to solve the first part which is improving the estimation for `AppendOnlyMap` when the value type is CompactBuffer
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org