You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Alexey Kudinkin (Jira)" <ji...@apache.org> on 2022/01/29 01:29:00 UTC
[jira] [Closed] (HUDI-3279) Metadata table stores incorrect file sizes after Restore
[ https://issues.apache.org/jira/browse/HUDI-3279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Kudinkin closed HUDI-3279.
---------------------------------
Resolution: Duplicate
> Metadata table stores incorrect file sizes after Restore
> --------------------------------------------------------
>
> Key: HUDI-3279
> URL: https://issues.apache.org/jira/browse/HUDI-3279
> Project: Apache Hudi
> Issue Type: Task
> Reporter: Alexey Kudinkin
> Assignee: Alexey Kudinkin
> Priority: Blocker
> Fix For: 0.11.0
>
> Attachments: Screen Shot 2022-01-19 at 12.17.21 PM.png, Screen Shot 2022-01-19 at 12.18.27 PM.png, Screen Shot 2022-01-19 at 7.56.37 PM.png
>
>
> While working on [https://github.com/apache/hudi/pull/4556,] I have stumbled upon an issue of the LogBlock Scanner EOF-ing on the log-files in tests after performing Restore operation.
> The root-cause of these turned out to be Metadata Table storing incorrect sizes of the files after Restore (sizes in MT are essentially 2x of what is in FS):
> !Screen Shot 2022-01-19 at 12.17.21 PM.png!
> !Screen Shot 2022-01-19 at 12.18.27 PM.png!
>
> This seems to occur due to following:
> # Metadata table treats new Records for the same file as "deltas", appending the file-size to its records (https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java#L227)]
> # Upon Restore (which is handled simply as a collection of Rollbacks) we pick *max* of the sizes of the files before and after the operation, not regarding to which we're actually rolling back to (https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java#L254).]
>
> *Proposal*
> Instead of simply always picking the max size, we should pick the size of the file as it was right before.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)