You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Elmer Garduno <ga...@gmail.com> on 2013/10/02 14:52:41 UTC

Accessing broadcast variables by name

Hi,

One of our use cases utilizes instances of objects that are instantiated by
name, to do the data processing. This means that we are not able to
directly pass the broadcast variable to the method executing it.

The work around we found by looking at the code was to request the variable
form the SparkEnv, which has the downside of requiring us to know the
internal name of the broadcasted variable and it is an internal of the
system which we can not rely on:

  val mMap =
org.apache.spark.SparkEnv.get.blockManager.getSingle("broadcast_0").get.asInstanceOf[Map[String,
String]]


The question is, would it be possible to access the broadcast variables by
name using something like this?

// On the main method
val mMap =  sc.broadcast(getMap(...))
val bname = mMap.name()

...

// On the external resource
val mMap = sc.broadcastVariable(bname)


Thanks,

Elmer

Re: Accessing broadcast variables by name

Posted by Elmer Garduno <ga...@gmail.com>.
Thanks, let me give that a try.


On Wed, Oct 2, 2013 at 11:48 PM, Reynold Xin <rx...@cs.berkeley.edu> wrote:

> I still don't fully understand your use case, but how about extending
> SparkContext yourself and add a hash map from string to broadcast variable.
> Then you can change the broadcast function to return the name?
>
>
> --
> Reynold Xin, AMPLab, UC Berkeley
> http://rxin.org
>
>
>
> On Wed, Oct 2, 2013 at 9:37 PM, Elmer Garduno <ga...@gmail.com> wrote:
>
>> On the framework we are using for data processing (UIMA), instances are
>> created by name and only a limited number of types can be passed as
>> parameters to the initializers (Java primitive types and arrays).
>>
>> So the only way we have to access the broadcasted variable from within
>> the instances is to retrieve them by name (a string that can be passed
>> through the initialization method) from the spark environment or the
>> context.
>>
>> Any thoughts?
>>
>>
>> On Wed, Oct 2, 2013 at 1:42 PM, Reynold Xin <rx...@cs.berkeley.edu> wrote:
>>
>>> Why don't you track it yourself with a hashmap?
>>>
>>>
>>> On Wednesday, October 2, 2013, Elmer Garduno wrote:
>>>
>>>> Hi,
>>>>
>>>> One of our use cases utilizes instances of objects that are
>>>> instantiated by name, to do the data processing. This means that we are not
>>>> able to directly pass the broadcast variable to the method executing it.
>>>>
>>>> The work around we found by looking at the code was to request the
>>>> variable form the SparkEnv, which has the downside of requiring us to know
>>>> the internal name of the broadcasted variable and it is an internal of the
>>>> system which we can not rely on:
>>>>
>>>>   val mMap =
>>>> org.apache.spark.SparkEnv.get.blockManager.getSingle("broadcast_0").get.asInstanceOf[Map[String,
>>>> String]]
>>>>
>>>>
>>>> The question is, would it be possible to access the broadcast variables
>>>> by name using something like this?
>>>>
>>>> // On the main method
>>>> val mMap =  sc.broadcast(getMap(...))
>>>> val bname = mMap.name()
>>>>
>>>> ...
>>>>
>>>> // On the external resource
>>>> val mMap = sc.broadcastVariable(bname)
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Elmer
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>>
>>> --
>>> Reynold Xin, AMPLab, UC Berkeley
>>> http://rxin.org
>>>
>>>
>>>
>>
>

Re: Accessing broadcast variables by name

Posted by Reynold Xin <rx...@cs.berkeley.edu>.
I still don't fully understand your use case, but how about extending
SparkContext yourself and add a hash map from string to broadcast variable.
Then you can change the broadcast function to return the name?


--
Reynold Xin, AMPLab, UC Berkeley
http://rxin.org



On Wed, Oct 2, 2013 at 9:37 PM, Elmer Garduno <ga...@gmail.com> wrote:

> On the framework we are using for data processing (UIMA), instances are
> created by name and only a limited number of types can be passed as
> parameters to the initializers (Java primitive types and arrays).
>
> So the only way we have to access the broadcasted variable from within the
> instances is to retrieve them by name (a string that can be passed through
> the initialization method) from the spark environment or the context.
>
> Any thoughts?
>
>
> On Wed, Oct 2, 2013 at 1:42 PM, Reynold Xin <rx...@cs.berkeley.edu> wrote:
>
>> Why don't you track it yourself with a hashmap?
>>
>>
>> On Wednesday, October 2, 2013, Elmer Garduno wrote:
>>
>>> Hi,
>>>
>>> One of our use cases utilizes instances of objects that are instantiated
>>> by name, to do the data processing. This means that we are not able to
>>> directly pass the broadcast variable to the method executing it.
>>>
>>> The work around we found by looking at the code was to request the
>>> variable form the SparkEnv, which has the downside of requiring us to know
>>> the internal name of the broadcasted variable and it is an internal of the
>>> system which we can not rely on:
>>>
>>>   val mMap =
>>> org.apache.spark.SparkEnv.get.blockManager.getSingle("broadcast_0").get.asInstanceOf[Map[String,
>>> String]]
>>>
>>>
>>> The question is, would it be possible to access the broadcast variables
>>> by name using something like this?
>>>
>>> // On the main method
>>> val mMap =  sc.broadcast(getMap(...))
>>> val bname = mMap.name()
>>>
>>> ...
>>>
>>> // On the external resource
>>> val mMap = sc.broadcastVariable(bname)
>>>
>>>
>>> Thanks,
>>>
>>> Elmer
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> --
>>
>> --
>> Reynold Xin, AMPLab, UC Berkeley
>> http://rxin.org
>>
>>
>>
>

Re: Accessing broadcast variables by name

Posted by Elmer Garduno <ga...@gmail.com>.
On the framework we are using for data processing (UIMA), instances are
created by name and only a limited number of types can be passed as
parameters to the initializers (Java primitive types and arrays).

So the only way we have to access the broadcasted variable from within the
instances is to retrieve them by name (a string that can be passed through
the initialization method) from the spark environment or the context.

Any thoughts?


On Wed, Oct 2, 2013 at 1:42 PM, Reynold Xin <rx...@cs.berkeley.edu> wrote:

> Why don't you track it yourself with a hashmap?
>
>
> On Wednesday, October 2, 2013, Elmer Garduno wrote:
>
>> Hi,
>>
>> One of our use cases utilizes instances of objects that are instantiated
>> by name, to do the data processing. This means that we are not able to
>> directly pass the broadcast variable to the method executing it.
>>
>> The work around we found by looking at the code was to request the
>> variable form the SparkEnv, which has the downside of requiring us to know
>> the internal name of the broadcasted variable and it is an internal of the
>> system which we can not rely on:
>>
>>   val mMap =
>> org.apache.spark.SparkEnv.get.blockManager.getSingle("broadcast_0").get.asInstanceOf[Map[String,
>> String]]
>>
>>
>> The question is, would it be possible to access the broadcast variables
>> by name using something like this?
>>
>> // On the main method
>> val mMap =  sc.broadcast(getMap(...))
>> val bname = mMap.name()
>>
>> ...
>>
>> // On the external resource
>> val mMap = sc.broadcastVariable(bname)
>>
>>
>> Thanks,
>>
>> Elmer
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
> --
>
> --
> Reynold Xin, AMPLab, UC Berkeley
> http://rxin.org
>
>
>

Re: Accessing broadcast variables by name

Posted by Reynold Xin <rx...@cs.berkeley.edu>.
Why don't you track it yourself with a hashmap?

On Wednesday, October 2, 2013, Elmer Garduno wrote:

> Hi,
>
> One of our use cases utilizes instances of objects that are instantiated
> by name, to do the data processing. This means that we are not able to
> directly pass the broadcast variable to the method executing it.
>
> The work around we found by looking at the code was to request the
> variable form the SparkEnv, which has the downside of requiring us to know
> the internal name of the broadcasted variable and it is an internal of the
> system which we can not rely on:
>
>   val mMap =
> org.apache.spark.SparkEnv.get.blockManager.getSingle("broadcast_0").get.asInstanceOf[Map[String,
> String]]
>
>
> The question is, would it be possible to access the broadcast variables by
> name using something like this?
>
> // On the main method
> val mMap =  sc.broadcast(getMap(...))
> val bname = mMap.name()
>
> ...
>
> // On the external resource
> val mMap = sc.broadcastVariable(bname)
>
>
> Thanks,
>
> Elmer
>
>
>
>
>
>
>
>
>
>

-- 

--
Reynold Xin, AMPLab, UC Berkeley
http://rxin.org