You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "mingleizhang (JIRA)" <ji...@apache.org> on 2017/07/24 12:29:00 UTC

[jira] [Commented] (FLINK-5789) Make Bucketing Sink independent of Hadoop's FileSysten

    [ https://issues.apache.org/jira/browse/FLINK-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16098292#comment-16098292 ] 

mingleizhang commented on FLINK-5789:
-------------------------------------

I would suggest we should have a more detailed documentation for designing this kind of API. like {{truncate}} functionality, or when I first see {{truncate}}, I dont know what a truncate is and where I can study from. So, I dont know what to think.

FYI, I put a link about how HDFS do it. Under HDFS-3107.

[https://issues.apache.org/jira/secure/attachment/12697141/HDFS_truncate.pdf]

Peace
Minglei

> Make Bucketing Sink independent of Hadoop's FileSysten
> ------------------------------------------------------
>
>                 Key: FLINK-5789
>                 URL: https://issues.apache.org/jira/browse/FLINK-5789
>             Project: Flink
>          Issue Type: Bug
>          Components: Streaming Connectors
>    Affects Versions: 1.2.0, 1.1.4
>            Reporter: Stephan Ewen
>             Fix For: 1.4.0
>
>
> The {{BucketingSink}} is hard wired to Hadoop's FileSystem, bypassing Flink's file system abstraction.
> This causes several issues:
>   - The bucketing sink will behave different than other file sinks with respect to configuration
>   - Directly supported file systems (not through hadoop) like the MapR File System does not work in the same way with the BuketingSink as other file systems
>   - The previous point is all the more problematic in the effort to make Hadoop an optional dependency and with in other stacks (Mesos, Kubernetes, AWS, GCE, Azure) with ideally no Hadoop dependency.
> We should port the {{BucketingSink}} to use Flink's FileSystem classes.
> To support the *truncate* functionality that is needed for the exactly-once semantics of the Bucketing Sink, we should extend Flink's FileSystem abstraction to have the methods
>   - {{boolean supportsTruncate()}}
>   - {{void truncate(Path, long)}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)