You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Nipun Arora <ni...@gmail.com> on 2017/05/26 17:11:02 UTC
[Spark Streaming] DAG Execution Model Clarification
Hi,
I would like some clarification on the execution model for spark streaming.
Broadly, I am trying to understand if output operations in a DAG are only
processed after all intermediate operations are finished for all parts of
the DAG.
Let me give an example:
I have a dstream -A , I do map operations on this dstream and create two
different dstreams -B and C such that
A ---> B -> (some operations) ---> kafka output 1
\----> C---> ( some operations) --> kafka output 2
I want to understand will kafka output 1 and kafka output 2 wait for all
operations to finish on B and C before sending an output, or will they
simply send an output as soon as the ops in B and C are done.
What kind of synchronization guarantees are there?
Thanks
Nipun