You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "L. C. Hsieh (Jira)" <ji...@apache.org> on 2019/10/30 06:51:00 UTC

[jira] [Created] (SPARK-29649) Stop task set if FileAlreadyExistsException was thrown when writing to output file

L. C. Hsieh created SPARK-29649:
-----------------------------------

             Summary: Stop task set if FileAlreadyExistsException was thrown when writing to output file
                 Key: SPARK-29649
                 URL: https://issues.apache.org/jira/browse/SPARK-29649
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: L. C. Hsieh
            Assignee: L. C. Hsieh


We already know task attempts that do not clean up output files in staging directory can cause job failure (SPARK-27194). There was proposals trying to fix it by changing output filename, or deleting existing output files. These proposals are not reliable completely.

The difficulty is, as previous failed task attempt wrote the output file, at next task attempt the output file is still under same staging directory, even the output file name is different.

If the job will go to fail eventually, there is no point to re-run the task until max attempts are reached. For the jobs running a lot of time, re-running the task can waste a lot of time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org