You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Gabor Gevay (JIRA)" <ji...@apache.org> on 2017/10/10 14:15:00 UTC

[jira] [Commented] (FLINK-1268) FileOutputFormat with overwrite does not clear local output directories

    [ https://issues.apache.org/jira/browse/FLINK-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198727#comment-16198727 ] 

Gabor Gevay commented on FLINK-1268:
------------------------------------

This issue just happened to me. I ran my job locally with parallelism 8, and then later with 4, and then I was debugging for an hour to figure out what went wrong.

> FileOutputFormat with overwrite does not clear local output directories
> -----------------------------------------------------------------------
>
>                 Key: FLINK-1268
>                 URL: https://issues.apache.org/jira/browse/FLINK-1268
>             Project: Flink
>          Issue Type: Bug
>          Components: Batch Connectors and Input/Output Formats
>            Reporter: Till Rohrmann
>            Priority: Minor
>
> I noticed that the FileOutputFormat does not clear the output directories if it writes to local disk. This has the consequence that previous partitions are still contained in the directory if one decreases the DOP between subsequent runs. If one reads the data from this directory, then more partitions will be read in than were actually written. This can lead to a wrong user code behaviour which is hard to debug. I'm aware that in case of a distributed execution the TaskManagers or the Tasks have to be responsible for the cleanup and if multiple Tasks are running on a TaskManager, then the cleanup has to be coordinated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)