You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Lin Zhao <li...@exabeam.com> on 2016/01/22 18:50:53 UTC

Help understanding DAG for mapWithState

I've been playing with mapWithState and would like to get a better understanding of Spark's execution plan. A DAG from the UI for a mapWithState step looks like the attached. Why is it that this step is broken into multiple RDDs and mapWithStates and sequenced together? It can't seem to be able to be explained by the operation's syntax. I understand that for the same key the previous state needs to be passed along. But this DAG seems to show results for differen partitions sequenced somehow. Any insight is appreciated.