You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Evo Eftimov <ev...@isecc.com> on 2015/07/07 13:41:06 UTC
RE:
The “RDD” aka Batch RDD which you load from file, will be kept for as long as the Spark Framework is instantiated / running – you can also ensure it is flagged explicitly as Persisted e.g. In Memory and/or disk
From: Anand Nalya [mailto:anand.nalya@gmail.com]
Sent: Tuesday, July 7, 2015 12:34 PM
To: user@spark.apache.org
Subject:
Hi,
Suppose I have an RDD that is loaded from some file and then I also have a DStream that has data coming from some stream. I want to keep union some of the tuples from the DStream into my RDD. For this I can use something like this:
var myRDD: RDD[(String, Long)] = sc.fromText...
dstream.foreachRDD{ rdd =>
myRDD = myRDD.union(rdd.filter(myfilter))
}
My questions is that for how long spark will keep RDDs underlying the dstream around? Is there some configuratoin knob that can control that?
Regards,
Anand