You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Chris Fregly <ch...@fregly.com> on 2014/04/13 20:14:57 UTC

Re: function state lost when next RDD is processed

or how about the UpdateStateByKey() operation?

https://spark.apache.org/docs/0.9.0/streaming-programming-guide.html

the StatefulNetworkWordCount example demonstrates how to keep state across RDDs.

> On Mar 28, 2014, at 8:44 PM, Mayur Rustagi <ma...@gmail.com> wrote:
> 
> Are you referring to Spark Streaming?
> 
> Can you save the sum as a RDD & keep joining the two rdd together?
> 
> Regards
> Mayur
> 
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi
> 
> 
> 
>> On Fri, Mar 28, 2014 at 10:47 AM, Adrian Mocanu <am...@verticalscope.com> wrote:
>> Thanks!
>> 
>>  
>> 
>> Ya that’s what I’m doing so far, but I wanted to see if it’s possible to keep the tuples inside Spark for fault tolerance purposes.
>> 
>>  
>> 
>> -A
>> 
>> From: Mark Hamstra [mailto:mark@clearstorydata.com] 
>> Sent: March-28-14 10:45 AM
>> To: user@spark.apache.org
>> Subject: Re: function state lost when next RDD is processed
>> 
>>  
>> 
>> As long as the amount of state being passed is relatively small, it's probably easiest to send it back to the driver and to introduce it into RDD transformations as the zero value of a fold.
>> 
>>  
>> 
>> On Fri, Mar 28, 2014 at 7:12 AM, Adrian Mocanu <am...@verticalscope.com> wrote:
>> 
>> I’d like to resurrect this thread since I don’t have an answer yet.
>> 
>>  
>> 
>> From: Adrian Mocanu [mailto:amocanu@verticalscope.com] 
>> Sent: March-27-14 10:04 AM
>> To: user@spark.incubator.apache.org
>> Subject: function state lost when next RDD is processed
>> 
>>  
>> 
>> Is there a way to pass a custom function to spark to run it on the entire stream? For example, say I have a function which sums up values in each RDD and then across RDDs.
>> 
>>  
>> 
>> I’ve tried with map, transform, reduce. They all apply my sum function on 1 RDD. When the next RDD comes the function starts from 0 so the sum of the previous RDD is lost.
>> 
>>  
>> 
>> Does Spark support a way of passing a custom function so that its state is preserved across RDDs and not only within RDD?
>> 
>>  
>> 
>> Thanks
>> 
>> -Adrian
>> 
>