You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Renyi Xiong <re...@gmail.com> on 2016/04/28 18:17:09 UTC

Spark streaming concurrent job scheduling question

Hi,

I am trying to run an I/O intensive RDD in parallel with CPU intensive RDD
within an application through a window like below:

var ssc = new StreamingContext(sc, 1min);
var ds1 = ...
var ds2 = ds1.Window(2min).ForeachRDD(...)
ds1.ForeachRDD(...)

I hope ds1 to start its job at 1min interval even if ds2's job not complete
yet.

but it is not the case when I run it - ds1's job won't start until ds2's
job completes.

I looked into document which mentions jobs within same SparkContext need to
be submitted in different thread in order to run in parallel.

is it true? then question becomes how should I submit the above 2 jobs in
different threads?

(I know there're concurrentJobs and receiver mode but both with some
particular issues)

Thanks a lot,
Renyi.