You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Alex Balikov (Jira)" <ji...@apache.org> on 2022/08/04 20:56:00 UTC

[jira] [Created] (SPARK-39983) Should not cache unserialized broadcast relations on the driver

Alex Balikov created SPARK-39983:
------------------------------------

             Summary: Should not cache unserialized broadcast relations on the driver
                 Key: SPARK-39983
                 URL: https://issues.apache.org/jira/browse/SPARK-39983
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.3.0
            Reporter: Alex Balikov


In TorrentBroadcast.writeBlocks we store the unserialized broadcast object in addition to the serialized version of it - 
{code:java}
private def writeBlocks(value: T): Int = {
    import StorageLevel._
    // Store a copy of the broadcast variable in the driver so that tasks run on the driver
    // do not create a duplicate copy of the broadcast variable's value.
    val blockManager = SparkEnv.get.blockManager
    if (!blockManager.putSingle(broadcastId, value, MEMORY_AND_DISK, tellMaster = false)) {
      throw new SparkException(s"Failed to store $broadcastId in BlockManager")
    }
 {code}
In case of broadcast relations, these objects can be fairly large (60MB in one observed case) and are not strictly necessary on the driver.

Add the option to not keep the unserialized versions of the objects.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org