You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/08/03 14:20:28 UTC
[GitHub] [hudi] dmenin opened a new issue #3394: [SUPPORT] Question on hudis default behaviour for UPSERT
dmenin opened a new issue #3394:
URL: https://github.com/apache/hudi/issues/3394
Hello everyone.
I have a quick question about hudi’s default behavior.
I want to understand how UPSERT works for the same key in different scenarios.
I am using ‘GLOBAL_SIMPLE’ index, which, from my understanding, tries to enforce uniqueness across all the partitions.
The scenario is really straight forward: based on a timestamp, I want new data to be upserted and old data to be ignored.
The data on disk(S3) is partitioned by year\month\day so there are basically 4 scenarios:
1) Inserting NEW data on the same partition
2) Inserting NEW data on different partition
3) Inserting OLD data on the same partition
4) Inserting OLD data on different partition
Below is the result of the test on these scenarios.
It is only one row with 4 columns.
two keys (composite - always 100, 100)
one description
one timestamp (it becomes the partitions and its the sort key)
Under “DB” you see the row that was on the database (the current state of the database);
Under “Row In” you can see the row that was read from the file and issued to the insert statement and
under “Result” you see the result of the database after the insert.
There are no headers, but the first two numbers (100 and 100) are the composite key, the string is the text and the datetime is the date of the row – which is converted to an integer (epoch) and used as parameter for both "hoodie.datasource.write.precombine.field" and ‘hoodie.payload.ordering.field'
As you can see below, cases 1 and 2 that deal with NEWER data, update the new data - this is expected.
Case 3, does not update the “older data” – see that the record on the DB was from 10AM and the new record was for 8AM – this is great, this also the behavior I want.
But on case4, If I try to upsert older data that belong to an OLDER partition – it updated the row. This is weird, I would expect cases 3 and 4 to behave the same.
Why does the partition of the data determines if the data is updated or not?
Why did scenario 4 DELETED the data from partition 24 and inserted on 23 - I mean, its great that hudi only kept one copy of each key but why the different behaviour of scenario 3 and 4?
This is all running in AWS Glue with hudi 0.7
CASE 1 - Inserting NEW data on the same partition
DB:
100 100 three 2021-06-23 10:00:00
Row In:
100 100 same partition 2021-06-23 10:01:00
Result (OK):
100 100 same partition 2021-06-23 10:01:00
CASE 2 - Inserting NEW data on different partition:
DB:
100 100 2021-06-23 10:01:00 same partition
Row In:
100 100 2021-06-24 10:01:00 dif partition
Result (OK):
100 100 2021-06-24 10:01:00 dif partition
CASE 3 - Inserting OLD data on the same partition
DB:
100 100 2021-06-24 10:01:00 dif partition
Row In:
100 100 2021-06-24 08:00:00 old data same partition
Result (OK):
100 100 2021-06-24 10:01:00 dif partition
CASE 4 - Inserting OLD data on different partition
DB:
100 100 2021-06-24 10:01:00 dif partition
Row In:
100 100 2021-06-23 09:00:00 old data dif partition
Result (BAD):
100 100 2021-06-23 09:00:00 old data dif partition
I am attaching the code that I am using.
Any help would be greatly apreciated.
Thanks very much
[hudisample.txt](https://github.com/apache/hudi/files/6924753/hudisample.txt)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] dmenin commented on issue #3394: [SUPPORT] Question on hudi's default behaviour for UPSERT
Posted by GitBox <gi...@apache.org>.
dmenin commented on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-895381894
ok, so bottom line, hudi doesn't have the concept of OLDER and NEWER in terms of row date (timestamp) - it only has NEW and OLD partition (where NEW corresponds to the data being upserted and OLD corresponds to the EXISTING partition of a particular key)
If I want the behaviour I described, I probably have to implement myself? Have you been around this use case and can suggest a solution? (the simplest one I can imagine is to manually delete the data thats obsolete and only insert the new data - but to do that, I have to join the incoming data with the existing data and check the differences.... which may not perform in the long term).
Thanks for your help so far.
Diego
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3394: [SUPPORT] Question on hudi's default behaviour for UPSERT
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-974819625
can you try setting `hoodie.datasource.write.precombine.field`. It should get applied to `hoodie.payload.ordering.field`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3394: [SUPPORT] Question on hudi's default behaviour for UPSERT
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-895327882
yes, you are right.
rec1, pp1, v2, pc2
// here v2, pc2 represents the updated value. If not updated, it would have been v1, pc1
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] maddy2u commented on issue #3394: [SUPPORT] Question on hudi's default behaviour for UPSERT
Posted by GitBox <gi...@apache.org>.
maddy2u commented on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-926588291
All good thanks Vinoth for yours and Sivabalan's support ! It can be closed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3394: [SUPPORT] Question on hudis default behaviour for UPSERT
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-892056474
yeah, with global index use-case, especially when there is a clash between two records just wrt partition path, depending on the [config value](https://hudi.apache.org/docs/configurations#bloomindexupdatepartitionpathupdatepartitionpath--false) set, hudi does either of these two.
a. delete existing storage record in old partition and insert to new partition
or
b. update incoming record to same old partition (ignoring the new partition.
Here hudi does not honor preCombine.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] dmenin commented on issue #3394: [SUPPORT] Question on hudi's default behaviour for UPSERT
Posted by GitBox <gi...@apache.org>.
dmenin commented on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-892446743
Hi Sivabalan,
Thanks very much for taking the time to reply.
A few questions:
1) the config value you linked seem to only applies to GLOBAL_BLOOM index. I am using GLOBAL_SIMPLE, so I don’t think it applies to my case.
2) You mentioned “preCombine”. My understanding is that preCombine works before the write…. If I have two records with the same key, preCombine will choose the one with largest value and the “submit” the insert command – so it shouldn’t affect the calculations between input data and existing data, correct? Since we only have 1 new row on the insert, it seems that preCombine is also not relevant in this case.
Could you clarify further, please?
Thanks,
Diego
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3394: [SUPPORT] Question on hudi's default behaviour for UPSERT
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-895408837
There are some nuances here. Ignore the global, different partitions for now. Just consider how to reconcile two records in general.
I guess you know what preCombine is used for (which is used to combine two records within same incoming batch of write).
But to reconcile an incoming record with one already on storage, Hudi relies on HoodieRecordPayload.combineAndGetUpdateValue().
Most commonly used payload impl is OverwriteWithLatestAvroPayload. So, this will always choose the latest incoming record over whats in storage.
But recently we also added another payload impl called DefaultHoodieRecordPayload. This payload will honor preCombine field while reconciling an incoming record with whats in storage using the preCombine field value(within combineAndGetUpdateValue()).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3394: [SUPPORT] Question on hudi's default behaviour for UPSERT
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-991913326
if you are good, can we close the issue out please.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan edited a comment on issue #3394: [SUPPORT] Question on hudis default behaviour for UPSERT
Posted by GitBox <gi...@apache.org>.
nsivabalan edited a comment on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-892056474
yeah, with global index use-case, especially when there is a clash between two records just wrt partition path, depending on the [config value](https://hudi.apache.org/docs/configurations#bloomindexupdatepartitionpathupdatepartitionpath--false) set, hudi does either of these two.
a. delete existing storage record in old partition and insert to new partition
or
b. update incoming record to same old partition (ignoring the new partition.
In this flow hudi does not honor preCombine.
PreCombine will be honored when an updates happen. for eg (b) in above scenario. Or just regular updates where both record key and partition path matches
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan edited a comment on issue #3394: [SUPPORT] Question on hudi's default behaviour for UPSERT
Posted by GitBox <gi...@apache.org>.
nsivabalan edited a comment on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-895408837
There are some nuances here. Ignoring the global, different partitions for now. Just consider how to reconcile two records in general. (in other words, there is only one partition and if a an update record is written to this partition where the record already exists in storage)
I guess you know what preCombine is used for (which is used to combine two records within same incoming batch of write).
But to reconcile an incoming record with one already on storage, Hudi relies on HoodieRecordPayload.combineAndGetUpdateValue().
Most commonly used payload impl is OverwriteWithLatestAvroPayload. So, this will always choose the latest incoming record over whats in storage.
But recently we also added another payload impl called DefaultHoodieRecordPayload. This payload will honor preCombine field while reconciling an incoming record with whats in storage using the preCombine field value(within combineAndGetUpdateValue()).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] maddy2u commented on issue #3394: [SUPPORT] Question on hudi's default behaviour for UPSERT
Posted by GitBox <gi...@apache.org>.
maddy2u commented on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-905846390
Hi Sivabalan,
I work with Diego on this topic.
1. We use Hudi 0.7 for our processing and storing data in Hudi Format. Based on what you mentioned, my understanding is that the below statement would not be applicable for this version of Hudi. Is it available in 0.8 or please correct my assumption? How do we enable us to use precombine field while reconciling an incoming record? Any edge scenarios that we must be aware of ?
> But recently we also added another payload impl called DefaultHoodieRecordPayload. This payload will honor preCombine field while reconciling an incoming record with whats in storage using the preCombine field value(within combineAndGetUpdateValue()).
Summarizing the discussion from this thread -
1. Hudi will always treat the new data coming in as the data that needs to overwrite. The data is always updated based on the new data that is coming in (implemented in OverwriteWithLatestAvroPayload)
2. Depending on hoodie.simple.index.update.partition.path = true/false, the data will be updated in the old or new partitions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] dmenin commented on issue #3394: [SUPPORT] Question on hudi's default behaviour for UPSERT
Posted by GitBox <gi...@apache.org>.
dmenin commented on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-895017446
Yes it does, thanks for clarifying.
I guess the confusion (mainly from my part) was that when I was said NEW and OLD, I was referring to the timestamp of the row (translated to the partition). For example, since my data is partitioned by year\month\day a record on the partition 23 is OLDER than one on the partition 24 - and my use case is that, if a record from 23 is submitted to huddi when the same record on partition 24 exists, it should be ignored - which didnt happen on use case 4 above - but now I understand why. I have "hoodie.simple.index.update.partition.path = true", which honoured the new partition type.
Just to confirm, on your first example (with hoodie.simple.index.update.partition.path = false), the partition path is ignored but the data is updated, correct?
Thanks,
Diego
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] ChaladiMohanVamsi commented on issue #3394: [SUPPORT] Question on hudi's default behaviour for UPSERT
Posted by GitBox <gi...@apache.org>.
ChaladiMohanVamsi commented on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-926697926
@nsivabalan
But recently we also added another payload impl called DefaultHoodieRecordPayload. This payload will honor preCombine field while reconciling an incoming record with whats in storage using the preCombine field value(within combineAndGetUpdateValue()).
I have a confusion on similar lines. Can you please clarify and correct my understanding.
How are following config differ in DefaultHoodieRecordPayload, which config will it choose to select record.
1. hoodie.payload.ordering.field
2. hoodie.datasource.write.precombine.field
With the same payload class is there a possibility to disable precombine during deduplicating in same incremental batch but allow deciding whether or not to update existing record.
I tried DefaultHoodiePayload class with
hoodi.combine.before.insert=false and not providing precombine field but has payload.ordering.field.
In this scenario it thrower an error of missing precombine field column.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3394: [SUPPORT] Question on hudi's default behaviour for UPSERT
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-905187503
@dmenin : hey, let me know if you need any more info. Will wait for couple of days and will close this out if we don't hear back.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3394: [SUPPORT] Question on hudi's default behaviour for UPSERT
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-894482014
1. sorry, looks like we missed to update our config page.
"hoodie.simple.index.update.partition.path" is the one for simple index.
2. Let me try to illustrate w/ simple example.
Format:
record key, partition path, col1, preCombine
insert:
rec1, pp1, v1, pc1
rec2, pp2, v1, pc1
both records will be inserted into hudi table.
data in hudi table
rec1, pp1, v1, pc1
rec2, pp2, v1, pc1
Now, lets see what happens if some overlapping records are ingested with hoodie.simple.index.update.partition.path = false. records will always be routed to old partition if found in hudi table.
new writes:
rec1, pp2, v2, pc2
rec3, pp2, v2, pc2
Once committed, this is what data in hudi table looks like
rec1, pp1, v2, pc2 // new partition path ignored.
rec2, pp2, v1, pc1
rec3, pp2, v2, pc2
Now, let's see what happens if some overlapping records are ingested with hoodie.simple.index.update.partition.path = true. records will always be routed to old partition if found in hudi table.
data in hudi table
rec1, pp1, v1, pc1
rec2, pp2, v1, pc1
new writes:
rec1, pp2, v2, pc2
rec3, pp2, v2, pc2
Once committed, this is what data in hudi table looks like
rec1, pp2, v2, pc2 // new partition path honored.
rec1, pp1, v1, pc1 : deleted.
rec2, pp2, v1, pc1
rec3, pp2, v2, pc2
Bottom line with global type index, is record keys are unique across entire data set (irrespective of partitionpath)
Let me know if this is clear.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan edited a comment on issue #3394: [SUPPORT] Question on hudis default behaviour for UPSERT
Posted by GitBox <gi...@apache.org>.
nsivabalan edited a comment on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-892056474
yeah, with global index use-case, especially when there is a clash between two records just wrt partition path, depending on the [config value](https://hudi.apache.org/docs/configurations#bloomindexupdatepartitionpathupdatepartitionpath--false) set, hudi does either of these two.
a. delete existing storage record in old partition and insert to new partition
or
b. update incoming record to same old partition (ignoring the new partition.
In this flow hudi does not honor preCombine.
PreCombine will be honored when a updates happen. for eg (b) in above scenario.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan closed issue #3394: [SUPPORT] Question on hudi's default behaviour for UPSERT
Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #3394:
URL: https://github.com/apache/hudi/issues/3394
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3394: [SUPPORT] Question on hudi's default behaviour for UPSERT
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-991913445
feel free to reopen if need be. would be happy to help.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3394: [SUPPORT] Question on hudi's default behaviour for UPSERT
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-912848915
1. To dedup records within the same incoming batch, you need to enable these configs.
https://hudi.apache.org/docs/configurations#hoodiecombinebeforeupsert
https://hudi.apache.org/docs/configurations#hoodiecombinebeforeinsert
In this case, payload impl does not matter.
2. yes, you can try using DefaultHoodieRecordPayload. It is available as part of 0.8.0.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] vinothchandar commented on issue #3394: [SUPPORT] Question on hudi's default behaviour for UPSERT
Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-926273464
@maddy2u any more updates on this issue ? or can we close this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] maddy2u commented on issue #3394: [SUPPORT] Question on hudi's default behaviour for UPSERT
Posted by GitBox <gi...@apache.org>.
maddy2u commented on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-914042334
Thank you Sivabalan ! Appreciate your support. We will come back with updates shortly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] maddy2u edited a comment on issue #3394: [SUPPORT] Question on hudi's default behaviour for UPSERT
Posted by GitBox <gi...@apache.org>.
maddy2u edited a comment on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-905846390
Hi Sivabalan,
I work with Diego on this topic and I have one question regarding your response -
1. We use Hudi 0.7 on AWS Glue for processing and storing data. Based on what you mentioned, my understanding is that the below statement would not be applicable for this version of Hudi. Is it available in 0.8 or please correct my assumption? How do we enable us to use precombine field while reconciling an incoming record? Any edge scenarios that we must be aware of ?
> But recently we also added another payload impl called DefaultHoodieRecordPayload. This payload will honor preCombine field while reconciling an incoming record with whats in storage using the preCombine field value(within combineAndGetUpdateValue()).
Summarizing the discussion from this thread -
1. Hudi will always treat the new data coming in as the data that needs to overwrite. The data is always updated based on the new data that is coming in (implemented in OverwriteWithLatestAvroPayload)
2. Depending on hoodie.simple.index.update.partition.path = true/false, the data will be updated in the old or new partitions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3394: [SUPPORT] Question on hudi's default behaviour for UPSERT
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3394:
URL: https://github.com/apache/hudi/issues/3394#issuecomment-974819625
can you try setting `hoodie.datasource.write.precombine.field`. It should get applied to `hoodie.payload.ordering.field`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org