You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/11/11 19:08:00 UTC

[jira] [Updated] (HUDI-308) Avoid Renames for tracking state transitions of all actions on dataset

     [ https://issues.apache.org/jira/browse/HUDI-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated HUDI-308:
--------------------------------
    Labels: pull-request-available  (was: )

> Avoid Renames for tracking state transitions of all actions on dataset
> ----------------------------------------------------------------------
>
>                 Key: HUDI-308
>                 URL: https://issues.apache.org/jira/browse/HUDI-308
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>          Components: Common Core
>            Reporter: Balaji Varadarajan
>            Assignee: Balaji Varadarajan
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.5.1
>
>         Attachments: IMG_0118.jpg
>
>
> Currently, We employ renames when transitioning states (REQUESTED, INFLIGHT, COMPLETED) of all actions in Hudi. 
> The idea is to always create new files pertaining to each state of an action (commit, compaction, clean, ....) that is being performed and have the Timeline management resolve conflicts when loading them from .hoodie to folder.  The Archiving logic will cleanup transient state files and archive terminal state files. 
> THis handling will be done consistently for all kinds of actions on datasets. As part of this project, we will cleanup un-necessary fields in metada, version them and standardize on avro/json.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)