You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Aureliano Buendia <bu...@gmail.com> on 2014/01/16 19:48:24 UTC

Spark does not retry a failed task due to hdfs io error

Hi,

When writing many file son s3 HDFS, with a rate of 1 in 1,000, spark tasks
fail due to this error:

java.io.FileNotFoundException: File does not exist: /tmp/...

This is probably initiated from this
bug<https://issues.apache.org/jira/browse/HADOOP-9328>
.

While this is a hadoop bug, the problem is that spark tasks do not retry
when they fail. Is this the expected way? Is there a setting to increase
the number of retries from zero?