You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "Jian Feng (Jira)" <ji...@apache.org> on 2022/09/20 15:28:00 UTC

[jira] [Assigned] (HUDI-4882) Multiple ordering fields for partial update to handle out-of-order events

     [ https://issues.apache.org/jira/browse/HUDI-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jian Feng reassigned HUDI-4882:
-------------------------------

    Assignee: Jian Feng

> Multiple ordering fields for partial update to handle out-of-order events
> -------------------------------------------------------------------------
>
>                 Key: HUDI-4882
>                 URL: https://issues.apache.org/jira/browse/HUDI-4882
>             Project: Apache Hudi
>          Issue Type: New Feature
>            Reporter: Jian Feng
>            Assignee: Jian Feng
>            Priority: Major
>         Attachments: image-2022-09-20-22-42-19-445.png, image-2022-09-20-22-46-52-907.png
>
>
> Let's give you some background about why we need multiple ordering fields
> For example, we have 2 sources,  one target table
> * source1's fields: *id, ts, name*
> * source2's fields:*id, ts, price*
> * target tables's fields:*id,ts,name, price*
> ts is the precombine field;
> in the 1st batch, we got two records from both sources:
>     Source1:
>      
> ||id||ts||name||
> |1|1|name_1|
>     Source 2:
>     
> ||id||ts||price||
> |1|3|price_3|
>  so the records in the target table should be:
> ||id||ts||name||price||
> |1|3|name_1|price_3|
>  
>  let's say in the 2nd batch, we got one event from the source1:
>  Source1:
>      ||id||ts||name||
> |1|2|name_2|
> but name_2 won't be updated to the target table, since its ts value is smaller than the ts value in the target table.
> This feature will allow users to perform partial updates across sub-tables/sources by determining the state of a set of columns in a row based on an ordering/precombine column.
> As such, a table can have MULTIPLE ordering fields.
> This use case is suitable for wide Hudi tables that are created from smaller sub-tables, where each of its sub-tables has its own precombine column, and where its records could be upserted out of order.
>  !image-2022-09-20-22-46-52-907.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)