You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Alex Balikov (Jira)" <ji...@apache.org> on 2022/08/04 20:56:00 UTC
[jira] [Created] (SPARK-39983) Should not cache unserialized broadcast relations on the driver
Alex Balikov created SPARK-39983:
------------------------------------
Summary: Should not cache unserialized broadcast relations on the driver
Key: SPARK-39983
URL: https://issues.apache.org/jira/browse/SPARK-39983
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 3.3.0
Reporter: Alex Balikov
In TorrentBroadcast.writeBlocks we store the unserialized broadcast object in addition to the serialized version of it -
{code:java}
private def writeBlocks(value: T): Int = {
import StorageLevel._
// Store a copy of the broadcast variable in the driver so that tasks run on the driver
// do not create a duplicate copy of the broadcast variable's value.
val blockManager = SparkEnv.get.blockManager
if (!blockManager.putSingle(broadcastId, value, MEMORY_AND_DISK, tellMaster = false)) {
throw new SparkException(s"Failed to store $broadcastId in BlockManager")
}
{code}
In case of broadcast relations, these objects can be fairly large (60MB in one observed case) and are not strictly necessary on the driver.
Add the option to not keep the unserialized versions of the objects.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org