You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Alexey Kudinkin (Jira)" <ji...@apache.org> on 2023/01/13 20:39:00 UTC

[jira] [Updated] (HUDI-3828) We need to revisit MOR block merging sequence

     [ https://issues.apache.org/jira/browse/HUDI-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexey Kudinkin updated HUDI-3828:
----------------------------------
    Description: 
Currently, block-merging is configurable to be either lazy or non-lazy. However non-lazy sequence is incorrect – it will be merging blocks before actually rolling back reverted ones. To make sure users do not accidentally hit this issue, we need to revisit MOR block merging sequence and make sure that following invariants are upheld
 # Blocks are merged in 2 passes:
 ## First we load all blocks, while dropping rolled back ones, then
 ## We merge them in another forward-pass
 # We should try to avoid having 2 merging sequences and instead consolidate on just one: right now we have "block + block", and "base + block", but we should be able to just get away with just the latter (this will simplify merging sequence quite substantially, for ex in respect to handling of deletions) 

  was:
We need to revisit MOR block merging sequence and make sure that following invariants are upheld
 # Block have to be merged backward-pass (ie we first fetch all the blocks, and merge them in the reverse order of their timeline)
 # We should try to avoid having 2 merging sequences and instead consolidate on just one: right now we have "block + block", and "base + block", but we should be able to just get away with just the latter (this will simplify merging sequence quite substantially, for ex in respect to handling of deletions) 


> We need to revisit MOR block merging sequence
> ---------------------------------------------
>
>                 Key: HUDI-3828
>                 URL: https://issues.apache.org/jira/browse/HUDI-3828
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Alexey Kudinkin
>            Assignee: Alexey Kudinkin
>            Priority: Critical
>             Fix For: 0.14.0
>
>
> Currently, block-merging is configurable to be either lazy or non-lazy. However non-lazy sequence is incorrect – it will be merging blocks before actually rolling back reverted ones. To make sure users do not accidentally hit this issue, we need to revisit MOR block merging sequence and make sure that following invariants are upheld
>  # Blocks are merged in 2 passes:
>  ## First we load all blocks, while dropping rolled back ones, then
>  ## We merge them in another forward-pass
>  # We should try to avoid having 2 merging sequences and instead consolidate on just one: right now we have "block + block", and "base + block", but we should be able to just get away with just the latter (this will simplify merging sequence quite substantially, for ex in respect to handling of deletions) 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)