You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "David_Liang (Jira)" <ji...@apache.org> on 2021/09/16 08:48:00 UTC

[jira] [Updated] (HUDI-2441) To support partial update function which can move and update the data from the old partition to the new partition , when the data with same key change it's partition

     [ https://issues.apache.org/jira/browse/HUDI-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David_Liang updated HUDI-2441:
------------------------------
    Description: 
to considerate such a scene, there 2 reocod  as follow in the source table
||post_id ||position||weight||ts||day ||
| 1|shengzhen|3KG|1630480027|{color:#ff0000}20210901{color}|
| 1|beijing|3KG|1630652828|{color:#ff0000}20210903{color}|

 

when using th {color:#ff0000}*Global Index*{color} with such sql

 
{code:java}
merge into target_table  t
   using (
        select post_id, position, ts , day from source_table
   ) as s
on t.id = s.id
when natched then update set  t.position = s.position, t.ts=s.ts, t.day = s.day
when not matched then insert *
{code}
 

Beacuse now the hudi engine haven't support *cross partitions partial merge into,* the result in the target table is ** 

 
||post_id  (as primiary key)||position||weight||ts||day||
| 1|beijing|3KG|1630652828|*{color:#ff0000}20210901{color}*|

the record still in  the old parition.

but the *expected* result is 

 
||post_id  (as primiary key)||position||weight||ts||day||
| 1|beijing|3KG|1630652828|{color:#ff0000}*20210903*{color}|

 

 

 

  was:
to considerate such a scene, there 2 reocod  as follow in the source table
||post_id ||position||weight||ts||day ||
| 1|shengzhen|3KG|1630480027|{color:#FF0000}20210901{color}|
| 1|beijing|3KG|1630652828|{color:#FF0000}20210903{color}|

 

when using th {color:#FF0000}*Global Index*{color} with such sql

 
{code:java}
merge into target_table  t
   using (
        select post_id, position, ts , day from source_table
   ) as s
on t.id = s.id
when natched then update set  t.position = s.position, t.ts=s.ts, t.day = s.day
when not matched then insert *
{code}
 

Beacuse now the hudi engine have support *cross partitions partial merge into,* the result in the target table is ** 

 
||post_id  (as primiary key)||position||weight||ts||day||
| 1|beijing|3KG|1630652828|*{color:#FF0000}20210901{color}*|

the record still in  the old parition.

but the *expected* result is 

 
||post_id  (as primiary key)||position||weight||ts||day||
| 1|beijing|3KG|1630652828|{color:#FF0000}*20210903*{color}|

 

 

 


> To support partial update function which can move and update the data from the old partition to the new partition , when the data with same key change it's partition
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-2441
>                 URL: https://issues.apache.org/jira/browse/HUDI-2441
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: Storage Management
>            Reporter: David_Liang
>            Priority: Major
>
> to considerate such a scene, there 2 reocod  as follow in the source table
> ||post_id ||position||weight||ts||day ||
> | 1|shengzhen|3KG|1630480027|{color:#ff0000}20210901{color}|
> | 1|beijing|3KG|1630652828|{color:#ff0000}20210903{color}|
>  
> when using th {color:#ff0000}*Global Index*{color} with such sql
>  
> {code:java}
> merge into target_table  t
>    using (
>         select post_id, position, ts , day from source_table
>    ) as s
> on t.id = s.id
> when natched then update set  t.position = s.position, t.ts=s.ts, t.day = s.day
> when not matched then insert *
> {code}
>  
> Beacuse now the hudi engine haven't support *cross partitions partial merge into,* the result in the target table is ** 
>  
> ||post_id  (as primiary key)||position||weight||ts||day||
> | 1|beijing|3KG|1630652828|*{color:#ff0000}20210901{color}*|
> the record still in  the old parition.
> but the *expected* result is 
>  
> ||post_id  (as primiary key)||position||weight||ts||day||
> | 1|beijing|3KG|1630652828|{color:#ff0000}*20210903*{color}|
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)