You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Vinoth Govindarajan (Jira)" <ji...@apache.org> on 2021/11/03 16:31:00 UTC

[jira] [Assigned] (HUDI-2681) Make hoodie record_key and preCombine_key optional

     [ https://issues.apache.org/jira/browse/HUDI-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinoth Govindarajan reassigned HUDI-2681:
-----------------------------------------

    Assignee: sivabalan narayanan

> Make hoodie record_key and preCombine_key optional
> --------------------------------------------------
>
>                 Key: HUDI-2681
>                 URL: https://issues.apache.org/jira/browse/HUDI-2681
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: Common Core
>            Reporter: Vinoth Govindarajan
>            Assignee: sivabalan narayanan
>            Priority: Major
>              Labels: 0.10_blocker
>
> At present, Hudi needs an record key and preCombine key to create an Hudi datasets, which puts an restriction on the kinds of datasets we can create using Hudi.
>  
> In order to increase the adoption of Hudi file format across all kinds of derived datasets, similar to Parquet/ORC, we need to offer flexibility to users. I understand that record key is used for upsert primitive and we need preCombine key to break the tie and deduplicate, but there are event data and other datasets without any primary key (append only datasets), which can benefit from Hudi since Hudi ecosystem offers other features such as snapshot isolation, indexes, clustering, delta streamer etc., which could be applied to any datasets without record key.
>  
> The idea of this proposal is to make both the record key and preCombine key optional to allow variety of new use cases on top of Hudi.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)