You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Balaji Varadarajan (Jira)" <ji...@apache.org> on 2019/12/16 05:29:00 UTC

[jira] [Updated] (HUDI-308) Avoid Renames for tracking state transitions of all actions on dataset

     [ https://issues.apache.org/jira/browse/HUDI-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Balaji Varadarajan updated HUDI-308:
------------------------------------
    Status: Closed  (was: Patch Available)

> Avoid Renames for tracking state transitions of all actions on dataset
> ----------------------------------------------------------------------
>
>                 Key: HUDI-308
>                 URL: https://issues.apache.org/jira/browse/HUDI-308
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>          Components: Common Core
>            Reporter: Balaji Varadarajan
>            Assignee: Balaji Varadarajan
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.5.1
>
>         Attachments: IMG_0118.jpg
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, We employ renames when transitioning states (REQUESTED, INFLIGHT, COMPLETED) of all actions in Hudi. 
> The idea is to always create new files pertaining to each state of an action (commit, compaction, clean, ....) that is being performed and have the Timeline management resolve conflicts when loading them from .hoodie to folder.  The Archiving logic will cleanup transient state files and archive terminal state files. 
> THis handling will be done consistently for all kinds of actions on datasets. As part of this project, we will cleanup un-necessary fields in metada, version them and standardize on avro/json.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)