You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sergey Zhemzhitsky <sz...@gmail.com> on 2018/05/10 19:24:00 UTC

Accumulator guarantees

Hi there,

Although Spark's docs state that there is a guarantee that
- accumulators in actions will only be updated once
- accumulators in transformations may be updated multiple times

... I'm wondering whether the same is true for transformations in the
last stage of the job or there is a guarantee that accumulators
running in transformations of the last stage are guaranteed to be
updated once too?

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Accumulator guarantees

Posted by Sergey Zhemzhitsky <sz...@gmail.com>.
As far as I understand updates of the custom accumulators at the
driver side happen during task completion [1].
The documentation states [2] that the very last stage in a job
consists of multiple ResultTasks, which execute the task and send its
output back to the driver application.
Also sources prove [3] that accumulators are updated for the
ResultTasks just once.

So it's seems that accumulators are safe to use and will be executed
only once regardless of where they are used (transformations as well
as actions) in case these transformations and actions are belong to
the last stage. Is it correct?
Could anyone of Spark commiters or contributors please confirm it?


[1] https://github.com/apache/spark/blob/3990daaf3b6ca2c5a9f7790030096262efb12cb2/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1194
[2] https://github.com/apache/spark/blob/a6fc300e91273230e7134ac6db95ccb4436c6f8f/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L36
[3] https://github.com/apache/spark/blob/3990daaf3b6ca2c5a9f7790030096262efb12cb2/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1204

On Thu, May 10, 2018 at 10:24 PM, Sergey Zhemzhitsky <sz...@gmail.com> wrote:
> Hi there,
>
> Although Spark's docs state that there is a guarantee that
> - accumulators in actions will only be updated once
> - accumulators in transformations may be updated multiple times
>
> ... I'm wondering whether the same is true for transformations in the
> last stage of the job or there is a guarantee that accumulators
> running in transformations of the last stage are guaranteed to be
> updated once too?

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org