You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Evo Eftimov <ev...@isecc.com> on 2015/07/07 13:41:06 UTC

RE:

The “RDD” aka Batch RDD which you load from file, will be kept for as long as the Spark Framework is instantiated / running – you can also ensure it is flagged explicitly as Persisted e.g. In Memory and/or disk

From: Anand Nalya [mailto:anand.nalya@gmail.com] 
Sent: Tuesday, July 7, 2015 12:34 PM
To: user@spark.apache.org
Subject: 

Hi,

Suppose I have an RDD that is loaded from some file and then I also have a DStream that has data coming from some stream. I want to keep union some of the tuples from the DStream into my RDD. For this I can use something like this:

  var myRDD: RDD[(String, Long)] = sc.fromText...

  dstream.foreachRDD{ rdd =>

    myRDD = myRDD.union(rdd.filter(myfilter))

  }

My questions is that for how long spark will keep RDDs underlying the dstream around? Is there some configuratoin knob that can control that?

Regards,

Anand