You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Vinoth Chandar (Jira)" <ji...@apache.org> on 2020/11/03 00:08:00 UTC

[jira] [Comment Edited] (HUDI-1323) Fence metadata reads using latest data timeline commit times!

    [ https://issues.apache.org/jira/browse/HUDI-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17225017#comment-17225017 ] 

Vinoth Chandar edited comment on HUDI-1323 at 11/3/20, 12:07 AM:
-----------------------------------------------------------------

{code:java}
if (r.getBlockType() != CORRUPT_BLOCK
    && !HoodieTimeline.compareTimestamps(r.getLogBlockHeader().get(INSTANT_TIME), HoodieTimeline.LESSER_THAN_OR_EQUALS, this.latestInstantTime
)) {
  // hit a block with instant time greater than should be processed, stop processing further
  break;
} {code}
the log scanner already stops at the provided `latestInstantTime`. So it may be sufficient to pass this in correctly, from the metadata reader


was (Author: vc):
{code:java}
if (r.getBlockType() != CORRUPT_BLOCK
    && !HoodieTimeline.compareTimestamps(r.getLogBlockHeader().get(INSTANT_TIME), HoodieTimeline.LESSER_THAN_OR_EQUALS, this.latestInstantTime
)) {
  // hit a block with instant time greater than should be processed, stop processing further
  break;
} {code}
the log scanner already stops at the provided `latestInstantTime`. So it may be sufficient to pass this in correctly, to the metadata reader

> Fence metadata reads using latest data timeline commit times!
> -------------------------------------------------------------
>
>                 Key: HUDI-1323
>                 URL: https://issues.apache.org/jira/browse/HUDI-1323
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: Prashant Wason
>            Assignee: Vinoth Chandar
>            Priority: Major
>
> Problem D: We need to fence metadata reads using latest data timeline commit times! and limit to only handing out files that belong to a committed instant on the data timeline. Otherwise, metadata table can hand uncommitted files to cleaner etc and cause us to delete legit latest file slices i.e data loss



--
This message was sent by Atlassian Jira
(v8.3.4#803005)