You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "Mendelson, Assaf" <As...@rsa.com> on 2017/02/09 10:27:13 UTC

Updating variable in foreachRDD

Hi,

I was wondering on how foreachRDD would run.
Specifically, let's say I do something like (nothing real, just for understanding):

var df = ???
var counter = 0

dstream.foreachRDD {
    rdd: RDD[Long] => {
      val df2 = rdd.toDF(...)
      df = df.union(df2)
     counter += 1
    if (counter > 100) {
        ssc.stop()
    }
}

Would this guarantee that df would be a union of the first 100 micro batches?
i.e. is foreachRDD guaranteed to run on the driver, updating everything locally (as opposed to lazily updating stuff or running on a worker)?
In simple tests this appears to work correctly but I am planning to do some more complex things (including using multiple dstreams)?

Thanks,
                Assaf.