You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Nipun Arora <ni...@gmail.com> on 2017/05/26 17:11:02 UTC

[Spark Streaming] DAG Execution Model Clarification

Hi,

I would like some clarification on the execution model for spark streaming.

Broadly, I am trying to understand if output operations in a DAG are only
processed after all intermediate operations are finished for all parts of
the DAG.

Let me give an example:

I have a dstream -A , I do map operations on this dstream and create two
different dstreams -B and C such that

A ---> B -> (some operations) ---> kafka output 1
  \----> C---> ( some operations) --> kafka output 2

I want to understand will kafka output 1 and kafka output 2 wait for all
operations to finish on B and C before sending an output, or will they
simply send an output as soon as the ops in B and C are done.

What kind of synchronization guarantees are there?

Thanks
Nipun