You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/07/25 16:48:00 UTC

[jira] [Commented] (FLINK-7265) FileSystems should describe their kind and consistency level

    [ https://issues.apache.org/jira/browse/FLINK-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100335#comment-16100335 ] 

ASF GitHub Bot commented on FLINK-7265:
---------------------------------------

GitHub user StephanEwen opened a pull request:

    https://github.com/apache/flink/pull/4397

    [FLINK-7265] [FLINK-7266] Introduce FileSystemKind and ConsistencyLevel for FileSystem

    ## What is the purpose of the change
    
    This change lets File Systems exposes more information about their kind and consistency. For example, whether they support real directory structures, efficient recursive deletes, or rename consistency.
    
    This information is used to hotfix the [FLINK-7266] bug where S3 cleanup becomes prohibitively expensive due to excessive and unnecessary bucket contents listing.
    
    ## Brief change log
    
      - Adds `ConsistencyLevel` and `FileSystemKind` enums do describe file systems
      - All `FileSystems` declare their kind and consistency
      - `HadoopFileSystemWrapper` infers the kind and consistency from the scheme
      - `FileStateHandle` only attempts to delete parent directory if the target file system is a proper filesystem.
    
    
    ## Verifying this change
    
    This change is verified by the addition of some unit tests:
      - Checking that the consistency levels and properties relate as expected
      - Check inference of `LocalFileSystem` consistency (Linux / Windows)
      - Check inference of HDFS wrapper file systems consistency
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): **no**
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **no**
      - The serializers: **no**
      - The runtime per-record code paths (performance sensitive): **no**
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: **no**
    
    The worst thing that could happen here is that empty checkpoint parent directories are not cleaned up on some file systems identifying themselves as `s3` or so.
    
    ## Documentation
    
      - Does this pull request introduce a new feature? **no**
      - If yes, how is the feature documented? **not applicable**


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/StephanEwen/incubator-flink s3_cleanup

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4397.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4397
    
----
commit 2c9ebfca42758c0bab958e9488f2e2d9777ddae9
Author: Stephan Ewen <se...@apache.org>
Date:   2017-07-25T15:19:25Z

    [FLINK-7265] [core] Introduce FileSystemKind and ConsistencyLevel for FileSystem
    
    These describe the characteristics of the file system, such as consistency and support
    for directories and efficient directory operations.

commit dead4bbb2769d7aaade029cd6c76f8a2139f69ce
Author: Stephan Ewen <se...@apache.org>
Date:   2017-07-25T15:26:38Z

    [FLINK-7266] [core] Prevent attempt for parent directory deletion for object stores

----


> FileSystems should describe their kind and consistency level
> ------------------------------------------------------------
>
>                 Key: FLINK-7265
>                 URL: https://issues.apache.org/jira/browse/FLINK-7265
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.3.1
>            Reporter: Stephan Ewen
>            Assignee: Stephan Ewen
>             Fix For: 1.4.0, 1.3.2
>
>
> Currently, all {{FileSystem}} types look the same to Flink.
> However, certain operations should only be executed on certain kinds of file systems.
> For example, it makes no sense to attempt to delete an empty parent directory on S3, because there are no such thinks as directories, only hierarchical naming in the keys (file names).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)