You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by jelmer <jk...@gmail.com> on 2021/08/12 08:34:02 UTC

Missing accumulator data when a task is speculated and the original task fails with TaskCommitDenied

Hi,

I am using spark 2.4.0.cloudera2 and I have a job that reads a small number
of files that result in an rdd with 5 partitions

I also have an accumulator that I update at the end of a map partition call
(when the iterator

What I've observed is that if a task is speculated and the original task
fails with TaskCommitDenied then the counts collected in the accumulator
for that partition are somehow lost

I've been reading articles outlining how data could be sent twice in case
of speculative tasks but I haven't read anything about accumulators losing
data.

Does anyone have any idea what could be the reason for this ?