You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/08/06 19:46:05 UTC
[GitHub] [hudi] nsivabalan commented on issue #3394: [SUPPORT] Question on hudi's default behaviour for UPSERT
nsivabalan commented on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-894482014
1. sorry, looks like we missed to update our config page.
"hoodie.simple.index.update.partition.path" is the one for simple index.
2. Let me try to illustrate w/ simple example.
Format:
record key, partition path, col1, preCombine
insert:
rec1, pp1, v1, pc1
rec2, pp2, v1, pc1
both records will be inserted into hudi table.
data in hudi table
rec1, pp1, v1, pc1
rec2, pp2, v1, pc1
Now, lets see what happens if some overlapping records are ingested with hoodie.simple.index.update.partition.path = false. records will always be routed to old partition if found in hudi table.
new writes:
rec1, pp2, v2, pc2
rec3, pp2, v2, pc2
Once committed, this is what data in hudi table looks like
rec1, pp1, v2, pc2 // new partition path ignored.
rec2, pp2, v1, pc1
rec3, pp2, v2, pc2
Now, let's see what happens if some overlapping records are ingested with hoodie.simple.index.update.partition.path = true. records will always be routed to old partition if found in hudi table.
data in hudi table
rec1, pp1, v1, pc1
rec2, pp2, v1, pc1
new writes:
rec1, pp2, v2, pc2
rec3, pp2, v2, pc2
Once committed, this is what data in hudi table looks like
rec1, pp2, v2, pc2 // new partition path honored.
rec1, pp1, v1, pc1 : deleted.
rec2, pp2, v1, pc1
rec3, pp2, v2, pc2
Bottom line with global type index, is record keys are unique across entire data set (irrespective of partitionpath)
Let me know if this is clear.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org