You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:33:47 UTC

[jira] [Resolved] (SPARK-17334) Provide management tools for broadcasted variables

     [ https://issues.apache.org/jira/browse/SPARK-17334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-17334.
----------------------------------
    Resolution: Incomplete

> Provide management tools for broadcasted variables
> --------------------------------------------------
>
>                 Key: SPARK-17334
>                 URL: https://issues.apache.org/jira/browse/SPARK-17334
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Assaf Mendelson
>            Priority: Minor
>              Labels: bulk-closed
>
> I propose to provide some management tools to manage broadcasted variables. 
> The main issue today is that broadcast must contain a reference which should be saved and used and we need to know if we already unpersisted it and we do not know where it takes memory and how much.
> Consider the following:
> Today we can create a broadcast variable, use it and destroy it later by saving the reference. 
> Consider the example from the documentation
> >>> from pyspark.context import SparkContext
> >>> sc = SparkContext('local', 'test')
> >>> b = sc.broadcast([1, 2, 3, 4, 5])
> >>> b.value
> [1, 2, 3, 4, 5]
> >>> sc.parallelize([0, 0]).flatMap(lambda x: b.value).collect()
> [1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
> >>> b.unpersist()
> The problem is that b needs to be saved and passed along.
> Instead I would like to see something like:
> >>> sc.broadcast("b",[1, 2, 3, 4, 5])
> >>> sc.getBroadcasted()
> ["a", "b", "c"]
> >>> sc.getBroadcastInfo("b")
> {"mem[bytes]":10, "type": List, "materializedExecutors" : [1,2,3,6,7]}
> >>> b = sc.getBroadcastRef("b")
> >>> print b.value
> [1, 2, 3, 4, 5]
> >>> sc.unpersist("b")
> maybe also add some per executor map to see what each executor contains.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org