You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "ho3rexqj (JIRA)" <ji...@apache.org> on 2018/01/08 03:10:00 UTC

[jira] [Created] (SPARK-22986) Avoid instantiating multiple instances of broadcast variables

ho3rexqj created SPARK-22986:
--------------------------------

             Summary: Avoid instantiating multiple instances of broadcast variables 
                 Key: SPARK-22986
                 URL: https://issues.apache.org/jira/browse/SPARK-22986
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.2.1
            Reporter: ho3rexqj


When resources happen to be constrained on an executor the first time a broadcast variable is instantiated it is persisted to disk by the BlockManager.  Consequently, every subsequent call to TorrentBroadcast::readBroadcastBlock from other instances of that broadcast variable spawns another instance of the underlying value.  That is, broadcast variables are spawned once per executor *unless* memory is constrained, in which case every instance of a broadcast variable is provided with a unique copy of the underlying value.

The fix I propose is to explicitly cache the underlying values using weak references (in a ReferenceMap) - note, however, that I couldn't find a clean approach to creating the cache container here.  I added that to BroadcastManager as a package-private field for want of a better solution, however if something more appropriate already exists in the project for that purpose please let me know.

The above issue was terminating our team's applications erratically - effectively, we were distributing roughly 1 GiB of data through a broadcast variable and under certain conditions memory was constrained the first time the broadcast variable was loaded on an executor.  As such, the executor attempted to spawn several additional copies of the broadcast variable (we were using 8 worker threads on the executor) which quickly led to the task failing as the result of an OOM exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org