You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/10/30 04:31:14 UTC
[GitHub] [hudi] RajasekarSribalan edited a comment on issue #2214: [SUPPORT] Hudi Upsert but with duplicates record for same key
RajasekarSribalan edited a comment on issue #2214:
URL: https://github.com/apache/hudi/issues/2214#issuecomment-719155536
Thanks Balaji for quick response.
Pls find my answer below.
Do you have hoodie.combine.before.upsert set to true ?
We don't set this flag , so it should be true by default.
You can also check if the duplicates have the same _hoodie_commit_time value
to see if this is the pattern ?
Yes they have the same _hoodie_commit_time ,same parquet files ,same hoodie
record key and different commit seq no for each duplicate entry.
It is also possible that you have more than one writer ingesting data to
the same dataset concurrently. This will not work as expected.
We have one hudi pipeline for one table and I suppose hudi doesn't support
concurrent writes/upserts. We consume messages from kafka ,transform and
then upsert in hudi.So I am still.unable to get you regarding ingesting
same dataset concurrently.Can you provide some information on this scenario?
Thanks,
Raj
On Fri, Oct 30, 2020, 3:04 AM Balaji Varadarajan <no...@github.com>
wrote:
> @RajasekarSribalan <https://github.com/RajasekarSribalan> : Do you have
> hoodie.combine.before.upsert set to true ? By default, this is true, so
> unless you have set to false, this should not be a problem ? You can also
> check if the duplicates have the same _hoodie_commit_time value to see if
> this is the pattern ?
>
> Another question, when you say duplicate record - Do they have same
> _hoodie_record_key value ?
>
> It is also possible that you have more than one writer ingesting data to
> the same dataset concurrently. This will not work as expected.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/hudi/issues/2214#issuecomment-719037420>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AFMO6I44KNN2RQMYAIBVCJLSNHNVTANCNFSM4TDVGEYA>
> .
>
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org