You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2019/11/10 03:53:00 UTC

[jira] [Updated] (SPARK-29649) Stop task set if FileAlreadyExistsException was thrown when writing to output file

     [ https://issues.apache.org/jira/browse/SPARK-29649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun updated SPARK-29649:
----------------------------------
    Component/s: SQL

> Stop task set if FileAlreadyExistsException was thrown when writing to output file
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-29649
>                 URL: https://issues.apache.org/jira/browse/SPARK-29649
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, SQL
>    Affects Versions: 3.0.0
>            Reporter: L. C. Hsieh
>            Assignee: L. C. Hsieh
>            Priority: Major
>
> We already know task attempts that do not clean up output files in staging directory can cause job failure (SPARK-27194). There was proposals trying to fix it by changing output filename, or deleting existing output files. These proposals are not reliable completely.
> The difficulty is, as previous failed task attempt wrote the output file, at next task attempt the output file is still under same staging directory, even the output file name is different.
> If the job will go to fail eventually, there is no point to re-run the task until max attempts are reached. For the jobs running a lot of time, re-running the task can waste a lot of time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org