You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Shixiong Zhu (JIRA)" <ji...@apache.org> on 2019/03/22 18:28:00 UTC

[jira] [Resolved] (SPARK-27210) Cleanup incomplete output files in ManifestFileCommitProtocol if task is aborted

     [ https://issues.apache.org/jira/browse/SPARK-27210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shixiong Zhu resolved SPARK-27210.
----------------------------------
       Resolution: Fixed
         Assignee: Jungtaek Lim
    Fix Version/s: 3.0.0

> Cleanup incomplete output files in ManifestFileCommitProtocol if task is aborted
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-27210
>                 URL: https://issues.apache.org/jira/browse/SPARK-27210
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.0.0
>            Reporter: Jungtaek Lim
>            Assignee: Jungtaek Lim
>            Priority: Minor
>             Fix For: 3.0.0
>
>
> Unlike HadoopMapReduceCommitProtocol, ManifestFileCommitProtocol doesn't clean up incomplete output files for both cases: task is aborted as well as job is aborted.
> In HadoopMapReduceCommitProtocol, it leverages stage directory to write intermediate files so once job is aborted it can simply delete stage directory to clean up everything. Even HadoopMapReduceCommitProtocol puts more effort on cleaning up intermediate files on task side if task is aborted.
> ManifestFileCommitProtocol doesn't do anything for cleaning up but just maintains the metadata which list of complete output files are written. It should be better if ManifestFileCommitProtocol can do the best effort to clean up: not sure it can do job level cleanup since it doesn't leverage stage directory, but it's clear that it can still put best effort to do task level cleanup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org