You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Bowen Zhu (Jira)" <ji...@apache.org> on 2022/07/27 05:53:00 UTC

[jira] [Commented] (HUDI-2118) Avoid checking corrupt log blocks for cloud storage

    [ https://issues.apache.org/jira/browse/HUDI-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571747#comment-17571747 ] 

Bowen Zhu commented on HUDI-2118:
---------------------------------

there are multiple ways to upload an object in GCS: [https://cloud.google.com/storage/docs/uploads-downloads]

The resumable upload, multi-part upload, parallel upload and streaming upload are not strict atomic upload. The object won't show up in normal bucket listing, but partial upload is possible and can be queried and listed. 

And the streaming upload could allow corrupted file to be accessible after transfer completed. And the file would remain accessible until deleted later by validating the checksum after transfer completed.

We would need to make sure hudi would never use those non-atomic upload methods for GCS, or we need to mark the GCS upload type as non-atomic.

 

> Avoid checking corrupt log blocks for cloud storage
> ---------------------------------------------------
>
>                 Key: HUDI-2118
>                 URL: https://issues.apache.org/jira/browse/HUDI-2118
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Rajesh Mahindra
>            Assignee: Bowen Zhu
>            Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)