You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/10/30 04:31:14 UTC

[GitHub] [hudi] RajasekarSribalan edited a comment on issue #2214: [SUPPORT] Hudi Upsert but with duplicates record for same key

RajasekarSribalan edited a comment on issue #2214:
URL: https://github.com/apache/hudi/issues/2214#issuecomment-719155536


   Thanks Balaji for quick response.
   
   Pls find my answer below.
   
   
   Do you have hoodie.combine.before.upsert set to true ?
   
   We don't set this flag , so it should be true by default.
   
   You can also check if the duplicates have the same _hoodie_commit_time value
   to see if this is the pattern ?
   
   Yes they have the same _hoodie_commit_time ,same parquet files ,same hoodie
   record key and different commit seq no for each duplicate entry.
   
   It is also possible that you have more than one writer ingesting data to
   the same dataset concurrently. This will not work as expected.
   
   
   We have one hudi pipeline for one table and I suppose hudi doesn't support
   concurrent writes/upserts. We consume messages from kafka ,transform and
   then upsert in hudi.So I am still.unable to get you regarding ingesting
   same dataset concurrently.Can you provide some information on this scenario?
   
   Thanks,
   Raj
   
   On Fri, Oct 30, 2020, 3:04 AM Balaji Varadarajan <no...@github.com>
   wrote:
   
   > @RajasekarSribalan <https://github.com/RajasekarSribalan> : Do you have
   > hoodie.combine.before.upsert set to true ? By default, this is true, so
   > unless you have set to false, this should not be a problem ? You can also
   > check if the duplicates have the same _hoodie_commit_time value to see if
   > this is the pattern ?
   >
   > Another question, when you say duplicate record - Do they have same
   > _hoodie_record_key value ?
   >
   > It is also possible that you have more than one writer ingesting data to
   > the same dataset concurrently. This will not work as expected.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/hudi/issues/2214#issuecomment-719037420>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AFMO6I44KNN2RQMYAIBVCJLSNHNVTANCNFSM4TDVGEYA>
   > .
   >
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org