You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/08/06 19:46:05 UTC

[GitHub] [hudi] nsivabalan commented on issue #3394: [SUPPORT] Question on hudi's default behaviour for UPSERT

nsivabalan commented on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-894482014


   1. sorry, looks like we missed to update our config page. 
   "hoodie.simple.index.update.partition.path" is the one for simple index. 
   
   2. Let me try to illustrate w/ simple example.
   
   Format: 
   record key, partition path, col1, preCombine
   
   insert:
   rec1, pp1, v1, pc1
   rec2, pp2, v1, pc1
   
   both records will be inserted into hudi table. 
   data in hudi table
   rec1, pp1, v1, pc1
   rec2, pp2, v1, pc1
   
   Now, lets see what happens if some overlapping records are ingested with hoodie.simple.index.update.partition.path = false. records will always be routed to old partition if found in hudi table. 
   
   new writes:
   rec1, pp2, v2, pc2
   rec3, pp2, v2, pc2
   
   Once committed, this is what data in hudi table looks like
   
   rec1, pp1, v2, pc2 // new partition path ignored. 
   rec2, pp2, v1, pc1
   rec3, pp2, v2, pc2
   
   
   Now, let's see what happens if some overlapping records are ingested with hoodie.simple.index.update.partition.path = true. records will always be routed to old partition if found in hudi table. 
   
   data in hudi table
   rec1, pp1, v1, pc1
   rec2, pp2, v1, pc1
   
   new writes:
   rec1, pp2, v2, pc2
   rec3, pp2, v2, pc2
   
   Once committed, this is what data in hudi table looks like
   
   rec1, pp2, v2, pc2 // new partition path honored. 
   rec1, pp1, v1, pc1 : deleted.  
   rec2, pp2, v1, pc1
   rec3, pp2, v2, pc2
   
   Bottom line with global type index, is record keys are unique across entire data set (irrespective of partitionpath)
   
   Let me know if this is clear. 
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org