You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "liupengcheng (Jira)" <ji...@apache.org> on 2020/03/20 09:55:00 UTC

[jira] [Created] (SPARK-31202) Improve SizeEstimator for AppendOnlyMap

liupengcheng created SPARK-31202:
------------------------------------

             Summary: Improve SizeEstimator for AppendOnlyMap
                 Key: SPARK-31202
                 URL: https://issues.apache.org/jira/browse/SPARK-31202
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 2.3.2, 3.0.0
            Reporter: liupengcheng


Currently, spark's memory management depends on the size estimation for execution and storage. 
In our real cluster, users always meet the issue OOM due to the inaccurate size estimation for ` AppendOnlyMap`, that's because spark stores KV in an Array[AnyRef] in `AppendOnlyMap` for memory locality,  and this value can be CompactBuffer[_] or Array[CompactBuffer[_]] for transformation like cogroup/join/groupBy, but current `SizeEstimator` will still treat this special array as an normal array, so in many cases, we noticed a great bias between the estimated size and the acutal memory consuption. 
So we improved this in xiaomi:
1. Improve the estimation for AppendOnlyMap when the value type is CompactBuffer
2. Respect jvm gc stats to decide whether to spilling when doing sort/agg

In this jira, I propose to solve the first part which is improving the estimation for `AppendOnlyMap` when the value type is CompactBuffer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org