You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/09 05:14:16 UTC

[GitHub] [hudi] hbgstc123 opened a new issue, #5812: [SUPPORT]Same primary key with different _hoodie_record_key

hbgstc123 opened a new issue, #5812:
URL: https://github.com/apache/hudi/issues/5812

   I have a table with a column `video_id` as primary key, and i find record with same primary key, have different record_key as shown in the picture below.
   
   <img width="325" alt="image" src="https://user-images.githubusercontent.com/8900183/172768122-2186a97d-6e39-44eb-b51d-c8acc9a507aa.png">
   
   Steps to reproduce the behavior:
   1.create table with spark sql, with tblproperties 
       tblproperties (
         type = 'mor',
         primaryKey = 'video_id'
       )
   2.insert historical data with spark sql
   3.ingest real time incremental data with flink
   
   config in flink ddl: 
   <img width="774" alt="image" src="https://user-images.githubusercontent.com/8900183/172768975-9695d659-81da-4561-b73c-2fcf43328f9d.png">
   
   The _hoodie_record_key that written with spark contains a prefix "video_id:" while data written with flink doesn't
   
   
   
   * Hudi version : 0.11.0
   
   * Spark version : 3.1
   
   * flink version : 1.13
   
   * Storage (HDFS/S3/GCS..) : hdfs
   
   * Running on Docker? (yes/no) : no 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #5812: [SUPPORT]Same primary key with different _hoodie_record_key

Posted by GitBox <gi...@apache.org>.
danny0405 commented on issue #5812:
URL: https://github.com/apache/hudi/issues/5812#issuecomment-1150772633

   This is a known problem, because Spark uses the `ComplexAvroKeyGenerator` by default even if the primary key only has one field, while flink would use `SimpleAvroKeyGenerator` instead when primary key fields is simple, a temporal solution is to set up the key generator for spark as `SimpleAvroKeyGenerator` instead manually, i would fire a fix for spark soon ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #5812: [SUPPORT]Same primary key with different _hoodie_record_key

Posted by GitBox <gi...@apache.org>.
danny0405 commented on issue #5812:
URL: https://github.com/apache/hudi/issues/5812#issuecomment-1150839248

   Have fired a fix here: https://github.com/apache/hudi/pull/5815


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] XuQianJin-Stars closed issue #5812: [SUPPORT]Same primary key with different _hoodie_record_key

Posted by GitBox <gi...@apache.org>.
XuQianJin-Stars closed issue #5812: [SUPPORT]Same primary key with different _hoodie_record_key
URL: https://github.com/apache/hudi/issues/5812


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org