You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Prashant Wason (Jira)" <ji...@apache.org> on 2022/01/05 20:11:00 UTC

[jira] [Created] (HUDI-3178) Metadata table compaction can include invalid updates from failed actions on dataset

Prashant Wason created HUDI-3178:
------------------------------------

             Summary: Metadata table compaction can include invalid updates from failed actions on dataset
                 Key: HUDI-3178
                 URL: https://issues.apache.org/jira/browse/HUDI-3178
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Prashant Wason
             Fix For: 0.10.1


Metadata Table v2 performs an inline compaction once a deltacommit has been written. 

Timeline:
  (on dataset) t1.commit.requested
  (on dataset) t1.commit.inflight
---- all parquet writes complete here, WriteStatus generated---
    (on metadata table) t1.deltacommit.requested
    (on metadata table) t1.deltacommit.inflight
    (on metadata table) t1.deltacommit
---- deltcommit completed ----
    (on metadata table) t1-001.compaction.requested
    (on metadata table) t1-001.compaction.inflight
    (on metadata table) t1-001.commit

If the t1.commit fails on the dataset then metadata table has already included information from the t1.commit in its base files which will be returned to readers. The metadata table reader logic only checks for deltacommits against completed instants on the dataset timeline and assumes a base file is always SANE.





--
This message was sent by Atlassian Jira
(v8.20.1#820001)