You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2022/01/06 01:16:00 UTC

[jira] [Updated] (HUDI-3178) Metadata table compaction can include invalid updates from failed actions on dataset

     [ https://issues.apache.org/jira/browse/HUDI-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

sivabalan narayanan updated HUDI-3178:
--------------------------------------
    Sprint: Hudi 0.10.1 -  2021/01/03

> Metadata table compaction can include invalid updates from failed actions on dataset
> ------------------------------------------------------------------------------------
>
>                 Key: HUDI-3178
>                 URL: https://issues.apache.org/jira/browse/HUDI-3178
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Prashant Wason
>            Priority: Blocker
>             Fix For: 0.10.1
>
>
> Metadata Table v2 performs an inline compaction once a deltacommit has been written. 
> Timeline:
>   (on dataset) t1.commit.requested
>   (on dataset) t1.commit.inflight
> ---- all parquet writes complete here, WriteStatus generated---
>     (on metadata table) t1.deltacommit.requested
>     (on metadata table) t1.deltacommit.inflight
>     (on metadata table) t1.deltacommit
> ---- deltcommit completed ----
>     (on metadata table) t1-001.compaction.requested
>     (on metadata table) t1-001.compaction.inflight
>     (on metadata table) t1-001.commit
> If the t1.commit fails on the dataset then metadata table has already included information from the t1.commit in its base files which will be returned to readers. The metadata table reader logic only checks for deltacommits against completed instants on the dataset timeline and assumes a base file is always SANE.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)