You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/10/25 17:16:59 UTC

[jira] [Commented] (SPARK-18098) Broadcast creates 1 instance / core, not 1 instance / executor

    [ https://issues.apache.org/jira/browse/SPARK-18098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15605898#comment-15605898 ] 

Sean Owen commented on SPARK-18098:
-----------------------------------

It shouldn't work that way. The value is loaded in a lazy val, at least. I think I can imagine cases where you would end up with several per executor but they're not the normal use cases. Can you say more about what you're executing or what you're seeing?

> Broadcast creates 1 instance / core, not 1 instance / executor
> --------------------------------------------------------------
>
>                 Key: SPARK-18098
>                 URL: https://issues.apache.org/jira/browse/SPARK-18098
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.0.1
>            Reporter: Anthony Sciola
>
> I've created my spark executors with $SPARK_HOME/sbin/start-slave.sh -c 7 -m 55g
> When I run a job which broadcasts data, it appears each *thread* requests and receives a copy of the broadcast object, not each *executor*. This means I need 7x as much memory for the broadcasted item because I have 7 cores.
> The problem appears to be due to a lack of synchronization around requesting broadcast items.
> The only workaround I've come up with is writing the data out to HDFS, broadcasting the paths, and doing a synchronized load from HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org