You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "ad1happy2go (via GitHub)" <gi...@apache.org> on 2023/03/28 13:06:14 UTC

[GitHub] [hudi] ad1happy2go commented on issue #7829: [SUPPORT] Using monotonically_increasing_id to generate record key causing duplicates on upsert

ad1happy2go commented on issue #7829:
URL: https://github.com/apache/hudi/issues/7829#issuecomment-1486853751

   @jtmzheng 
   
   This issue was partially resolved in spark with this JIRA - https://issues.apache.org/jira/browse/SPARK-23599
   
   But if you check last comment on above JIRA, someone started to see similar duplicate issue what you have with UUID function.
   
   "We have encountered this problem with Spark 3.1.2, resulting in duplicate values in a situation where a spark executor died. As suggested in the description, this error was hard to track down and difficult to replicate."
   
   How frequent it is ? As a workaround,  Can you use combination of both monotonically_increasing_id and uuid to ensure it to be always unique. May be it give a small performance hit due to generation of such a large id but it should be always unique.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org