You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by spr <sp...@yarcdata.com> on 2014/11/06 00:30:22 UTC

how to blend a DStream and a broadcast variable?

My use case has one large data stream (DS1) that obviously maps to a DStream. 
The processing of DS1 involves filtering it for any of a set of known
values, which will change over time, though slowly by streaming standards. 
If the filter data were static, it seems to obviously map to a broadcast
variable, but it's dynamic.  (And I don't think it works to implement it as
a DStream, because the new values need to be copied redundantly to all
executors, not partitioned among the executors.)

Looking at the Spark and Spark Streaming documentation, I have two
questions:

1) There's no mention in the Spark Streaming Programming Guide of broadcast
variables.  Do they coexist properly?

2) Once I have a broadcast variable in place in the "periodic function" that
Spark Streaming executes, how can I update its value?  Obviously I can't
literally update the value of that broadcast variable, which is immutable,
but how can I get a new version of the variable established in all the
executors?

(And the other ever-present implicit question...)

3) Is there a better way to implement this?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-blend-a-DStream-and-a-broadcast-variable-tp18227.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: how to blend a DStream and a broadcast variable?

Posted by Steve Reinhardt <sp...@yarcdata.com>.
Excellent. Is there an example of this somewhere?

Sent from my iPhone

> On Nov 6, 2014, at 1:43 AM, Sean Owen <so...@cloudera.com> wrote:
> 
> Broadcast vars should work fine in Spark streaming. Broadcast vars are
> immutable however. If you have some info to cache which might change
> from batch to batch, you should be able to load it at the start of
> your 'foreachRDD' method or equivalent. That's simple and works
> assuming your batch interval isn't so short and data so big that
> loading it every time is a burden.
> 
>> On Wed, Nov 5, 2014 at 11:30 PM, spr <sp...@yarcdata.com> wrote:
>> My use case has one large data stream (DS1) that obviously maps to a DStream.
>> The processing of DS1 involves filtering it for any of a set of known
>> values, which will change over time, though slowly by streaming standards.
>> If the filter data were static, it seems to obviously map to a broadcast
>> variable, but it's dynamic.  (And I don't think it works to implement it as
>> a DStream, because the new values need to be copied redundantly to all
>> executors, not partitioned among the executors.)
>> 
>> Looking at the Spark and Spark Streaming documentation, I have two
>> questions:
>> 
>> 1) There's no mention in the Spark Streaming Programming Guide of broadcast
>> variables.  Do they coexist properly?
>> 
>> 2) Once I have a broadcast variable in place in the "periodic function" that
>> Spark Streaming executes, how can I update its value?  Obviously I can't
>> literally update the value of that broadcast variable, which is immutable,
>> but how can I get a new version of the variable established in all the
>> executors?
>> 
>> (And the other ever-present implicit question...)
>> 
>> 3) Is there a better way to implement this?
>> 
>> 
>> 
>> --
>> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-blend-a-DStream-and-a-broadcast-variable-tp18227.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: how to blend a DStream and a broadcast variable?

Posted by Sean Owen <so...@cloudera.com>.
Broadcast vars should work fine in Spark streaming. Broadcast vars are
immutable however. If you have some info to cache which might change
from batch to batch, you should be able to load it at the start of
your 'foreachRDD' method or equivalent. That's simple and works
assuming your batch interval isn't so short and data so big that
loading it every time is a burden.

On Wed, Nov 5, 2014 at 11:30 PM, spr <sp...@yarcdata.com> wrote:
> My use case has one large data stream (DS1) that obviously maps to a DStream.
> The processing of DS1 involves filtering it for any of a set of known
> values, which will change over time, though slowly by streaming standards.
> If the filter data were static, it seems to obviously map to a broadcast
> variable, but it's dynamic.  (And I don't think it works to implement it as
> a DStream, because the new values need to be copied redundantly to all
> executors, not partitioned among the executors.)
>
> Looking at the Spark and Spark Streaming documentation, I have two
> questions:
>
> 1) There's no mention in the Spark Streaming Programming Guide of broadcast
> variables.  Do they coexist properly?
>
> 2) Once I have a broadcast variable in place in the "periodic function" that
> Spark Streaming executes, how can I update its value?  Obviously I can't
> literally update the value of that broadcast variable, which is immutable,
> but how can I get a new version of the variable established in all the
> executors?
>
> (And the other ever-present implicit question...)
>
> 3) Is there a better way to implement this?
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-blend-a-DStream-and-a-broadcast-variable-tp18227.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org