You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Emil Ejbyfeldt (Jira)" <ji...@apache.org> on 2022/10/05 06:37:00 UTC

[jira] [Created] (SPARK-40662) Serialization of MapStatuses is somtimes much larger on scala 2.13

Emil Ejbyfeldt created SPARK-40662:
--------------------------------------

             Summary: Serialization of MapStatuses is somtimes much larger on scala 2.13
                 Key: SPARK-40662
                 URL: https://issues.apache.org/jira/browse/SPARK-40662
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.3.0
            Reporter: Emil Ejbyfeldt


We have observed a case where the same job run against spark on scala 2.13 fails going out of memory due to the the broadcast for the MapStatuses being huge.

In the logs around the time the job fails it tries to create a broadcast of size 4.8GiB. 
```
2022-09-18 22:46:01,418 INFO memory.MemoryStore: Block broadcast_17 stored as values in memory (estimated size 4.8 GiB, free 12.9 GiB)
```

The same broadcast of the MapStatus for the same job running on 2.12 is 391.5 Mib so 
```
2022-09-18 16:11:58,753 INFO memory.MemoryStore: Block broadcast_17 stored as values in memory (estimated size 391.5 MiB, free 26.4 GiB)
```

in this particular case it seems the broadcast for MapStatuses more than 10 large when using 2.13. This is not something universal for all MapStatus broadcast as we have have many other jobs using Scala 2.13 where the status is ruffly the same size. 

This has been observed on 3.3.0 but I also tested it against 3.3.1-rc2 and build of 3.4.0-SNAPSHOT and both of those also reproduced the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org