You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Mendelson, Assaf" <As...@rsa.com> on 2017/02/09 10:27:13 UTC
Updating variable in foreachRDD
Hi,
I was wondering on how foreachRDD would run.
Specifically, let's say I do something like (nothing real, just for understanding):
var df = ???
var counter = 0
dstream.foreachRDD {
rdd: RDD[Long] => {
val df2 = rdd.toDF(...)
df = df.union(df2)
counter += 1
if (counter > 100) {
ssc.stop()
}
}
Would this guarantee that df would be a union of the first 100 micro batches?
i.e. is foreachRDD guaranteed to run on the driver, updating everything locally (as opposed to lazily updating stuff or running on a worker)?
In simple tests this appears to work correctly but I am planning to do some more complex things (including using multiple dstreams)?
Thanks,
Assaf.