You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by jelmer <jk...@gmail.com> on 2021/08/12 08:34:02 UTC
Missing accumulator data when a task is speculated and the original
task fails with TaskCommitDenied
Hi,
I am using spark 2.4.0.cloudera2 and I have a job that reads a small number
of files that result in an rdd with 5 partitions
I also have an accumulator that I update at the end of a map partition call
(when the iterator
What I've observed is that if a task is speculated and the original task
fails with TaskCommitDenied then the counts collected in the accumulator
for that partition are somehow lost
I've been reading articles outlining how data could be sent twice in case
of speculative tasks but I haven't read anything about accumulators losing
data.
Does anyone have any idea what could be the reason for this ?