You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Emil Ejbyfeldt (Jira)" <ji...@apache.org> on 2022/10/07 06:16:00 UTC

[jira] [Resolved] (SPARK-40662) Serialization of MapStatuses is somtimes much larger on scala 2.13

     [ https://issues.apache.org/jira/browse/SPARK-40662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emil Ejbyfeldt resolved SPARK-40662.
------------------------------------
    Resolution: Invalid

The increase was caused by change in hashCode between 2.12 and 2.13 and when reading using a different scala version caused there to be a lot more non empty blocks to fetch.

> Serialization of MapStatuses is somtimes much larger on scala 2.13
> ------------------------------------------------------------------
>
>                 Key: SPARK-40662
>                 URL: https://issues.apache.org/jira/browse/SPARK-40662
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.3.0
>            Reporter: Emil Ejbyfeldt
>            Priority: Major
>
> We have observed a case where the same job run against spark on scala 2.13 fails going out of memory due to the the broadcast for the MapStatuses being huge.
> In the logs around the time the job fails it tries to create a broadcast of size 4.8GiB. 
> ```
> 2022-09-18 22:46:01,418 INFO memory.MemoryStore: Block broadcast_17 stored as values in memory (estimated size 4.8 GiB, free 12.9 GiB)
> ```
> The same broadcast of the MapStatus for the same job running on 2.12 is 391.5 Mib so 
> ```
> 2022-09-18 16:11:58,753 INFO memory.MemoryStore: Block broadcast_17 stored as values in memory (estimated size 391.5 MiB, free 26.4 GiB)
> ```
> in this particular case it seems the broadcast for MapStatuses more than 10 large when using 2.13. This is not something universal for all MapStatus broadcast as we have have many other jobs using Scala 2.13 where the status is ruffly the same size. 
> This has been observed on 3.3.0 but I also tested it against 3.3.1-rc2 and build of 3.4.0-SNAPSHOT and both of those also reproduced the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org