You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Jian Feng (Jira)" <ji...@apache.org> on 2022/09/20 15:28:00 UTC
[jira] [Assigned] (HUDI-4882) Multiple ordering fields for partial update to handle out-of-order events
[ https://issues.apache.org/jira/browse/HUDI-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jian Feng reassigned HUDI-4882:
-------------------------------
Assignee: Jian Feng
> Multiple ordering fields for partial update to handle out-of-order events
> -------------------------------------------------------------------------
>
> Key: HUDI-4882
> URL: https://issues.apache.org/jira/browse/HUDI-4882
> Project: Apache Hudi
> Issue Type: New Feature
> Reporter: Jian Feng
> Assignee: Jian Feng
> Priority: Major
> Attachments: image-2022-09-20-22-42-19-445.png, image-2022-09-20-22-46-52-907.png
>
>
> Let's give you some background about why we need multiple ordering fields
> For example, we have 2 sources, one target table
> * source1's fields: *id, ts, name*
> * source2's fields:*id, ts, price*
> * target tables's fields:*id,ts,name, price*
> ts is the precombine field;
> in the 1st batch, we got two records from both sources:
> Source1:
>
> ||id||ts||name||
> |1|1|name_1|
> Source 2:
>
> ||id||ts||price||
> |1|3|price_3|
> so the records in the target table should be:
> ||id||ts||name||price||
> |1|3|name_1|price_3|
>
> let's say in the 2nd batch, we got one event from the source1:
> Source1:
> ||id||ts||name||
> |1|2|name_2|
> but name_2 won't be updated to the target table, since its ts value is smaller than the ts value in the target table.
> This feature will allow users to perform partial updates across sub-tables/sources by determining the state of a set of columns in a row based on an ordering/precombine column.
> As such, a table can have MULTIPLE ordering fields.
> This use case is suitable for wide Hudi tables that are created from smaller sub-tables, where each of its sub-tables has its own precombine column, and where its records could be upserted out of order.
> !image-2022-09-20-22-46-52-907.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)