You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2017/09/05 16:21:00 UTC

[jira] [Commented] (FLINK-7266) Don't attempt to delete parent directory on S3

    [ https://issues.apache.org/jira/browse/FLINK-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16153910#comment-16153910 ] 

Steve Loughran commented on FLINK-7266:
---------------------------------------

FWIW, in s3a we create a single delete request to rm all parent paths *and don't bother doing the existence check*. 

That is, for a file a/b/c.txt, after the file is written in close(), POST a delete list of

/a/
/a/b

It's ~O(1)  for depth and as you don't need to wait for the response, even something you could being async on.

> Don't attempt to delete parent directory on S3
> ----------------------------------------------
>
>                 Key: FLINK-7266
>                 URL: https://issues.apache.org/jira/browse/FLINK-7266
>             Project: Flink
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.3.1
>            Reporter: Stephan Ewen
>            Assignee: Stephan Ewen
>            Priority: Critical
>             Fix For: 1.4.0, 1.3.2
>
>
> Currently, every attempted release of an S3 state object also checks if the "parent directory" is empty and then tries to delete it.
> Not only is that unnecessary on S3, but it is prohibitively expensive and for example causes S3 to throttle calls by the JobManager on checkpoint cleanup.
> The {{FileState}} must only attempt parent directory cleanup when operating against real file systems, not when operating against object stores.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)