You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2019/03/20 07:43:00 UTC

[jira] [Assigned] (SPARK-27210) Cleanup incomplete output files in ManifestFileCommitProtocol if task is aborted

     [ https://issues.apache.org/jira/browse/SPARK-27210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-27210:
------------------------------------

    Assignee: Apache Spark

> Cleanup incomplete output files in ManifestFileCommitProtocol if task is aborted
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-27210
>                 URL: https://issues.apache.org/jira/browse/SPARK-27210
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.0.0
>            Reporter: Jungtaek Lim
>            Assignee: Apache Spark
>            Priority: Minor
>
> Unlike HadoopMapReduceCommitProtocol, ManifestFileCommitProtocol doesn't clean up incomplete output files for both cases: task is aborted as well as job is aborted.
> In HadoopMapReduceCommitProtocol, it leverages stage directory to write intermediate files so once job is aborted it can simply delete stage directory to clean up everything. Even HadoopMapReduceCommitProtocol puts more effort on cleaning up intermediate files on task side if task is aborted.
> ManifestFileCommitProtocol doesn't do anything for cleaning up but just maintains the metadata which list of complete output files are written. It should be better if ManifestFileCommitProtocol can do the best effort to clean up: not sure it can do job level cleanup since it doesn't leverage stage directory, but it's clear that it can still put best effort to do task level cleanup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org