You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ramkrishna.s.vasudevan (Jira)" <ji...@apache.org> on 2022/12/02 05:38:00 UTC

[jira] [Commented] (FLINK-30128) Introduce Azure Data Lake Gen2 APIs in the Hadoop Recoverable path

    [ https://issues.apache.org/jira/browse/FLINK-30128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642287#comment-17642287 ] 

ramkrishna.s.vasudevan commented on FLINK-30128:
------------------------------------------------

Attaching a simple diagram that talks about how the class diagram will look like. 
Basically HadoopFileSystem will be extended to create AzureBlobFileSystem.
Internally it would create the AzureBlobRecoverableWriter which will work with AzureBlobFsRecoverableDataOutputStream. Will raise a PR for this after some more testing. 

> Introduce Azure Data Lake Gen2 APIs in the Hadoop Recoverable path
> ------------------------------------------------------------------
>
>                 Key: FLINK-30128
>                 URL: https://issues.apache.org/jira/browse/FLINK-30128
>             Project: Flink
>          Issue Type: Sub-task
>    Affects Versions: 1.13.1
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Major
>         Attachments: Flink_ABFS_support.pdf
>
>
> Currently the HadoopRecoverableWriter assumes that the underlying FS is Hadoop and so it checks for DistributedFileSystem. It also tries to do a truncate and ensure the lease is recovered before the 'rename' operation is done.
> In the Azure Data lake gen 2 world, the driver does not support truncate and lease recovery API. We should be able to get the last committed size and if it matches go for the rename. Will be back with more details here. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)