You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Lars Francke (Jira)" <ji...@apache.org> on 2020/10/21 13:11:00 UTC

[jira] [Created] (SPARK-33206) Spark Shuffle Index Cache calculates memory usage wrong

Lars Francke created SPARK-33206:
------------------------------------

Summary: Spark Shuffle Index Cache calculates memory usage wrong
Key: SPARK-33206
URL: https://issues.apache.org/jira/browse/SPARK-33206
Project: Spark
Issue Type: Bug
Components: Shuffle
Affects Versions: 3.0.1, 2.4.0
Reporter: Lars Francke
Attachments: image001(1).png

SPARK-21501 changed the spark shuffle index service to be based on memory instead of the number of files.

Unfortunately, there's a problem with the calculation which is based on size information provided by `ShuffleIndexInformation`.

It is based purely on the file size of the cached file on disk.

We're running in OOMs with very small index files (byte size ~16 bytes) but the overhead of the ShuffleIndexInformation around this is much larger (e.g. 184 bytes, see screenshot). We need to take this into account and should probably add a fixed overhead of somewhere between 152 and 180 bytes according to my tests. I'm not 100% sure what the correct number is and it'll also depend on the architecture etc. so we can't be exact anyway.

If we do that we can maybe get rid of the size field in ShuffleIndexInformation to save a few more bytes per entry.

In effect this means that for small files we use up about 70-100 times as much memory as we intend to. Our NodeManagers OOM with 4GB and more of indexShuffleCache.

--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org