You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/21 04:50:48 UTC

[GitHub] [hudi] Guanpx opened a new issue #4658: [SUPPORT] Data lose with Flink write COW insert table, Flink web UI show Records Received was different with HIVE count(1)

Guanpx opened a new issue #4658:
URL: https://github.com/apache/hudi/issues/4658


   **Describe the problem you faced**
   
   Data lose with Flink write COW insert table, Flink web UI show Records Received was different with HIVE count(1) 
   
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Flink write and sync to hive
   2. and face that Flink web UI show Records Received was different with HIVE(impala) count(1) 
   
   **Expected behavior**
   
   ![image](https://user-images.githubusercontent.com/29246713/150461634-237e705c-1bff-4183-bf8a-be7222b7d917.png)
   ![Uploading image.png…]()
   
   **Environment Description**
   
   * Hudi version : 0.10.0
   
   * Flink version : 1.13.2
   
   * Hive version : 2.1.1-cdh6
   
   * Hadoop version : 3.0.0-cdh6
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Flink writre config
   
   ```
   'connector' = 'hudi',
     'path' = 'hdfs://nameservice-ha/hudi/rds/event_log_origin',
     'table.type' = 'COPY_ON_WRITE',
   
     'hoodie.datasource.write.recordkey.field' = 'distinct_id',  
   
     'hive_sync.enable'='true',           			
     'hive_sync.table'='hudi_event_log_origin',        
     'hive_sync.db'='default',                 
     'hive_sync.mode' = 'hms',                 
     'hive_sync.metastore.uris' = '', 	
     'hive_sync.skip_ro_suffix' = 'true',    		
   
     'hoodie.datasource.write.operation' = 'insert',    -- append模式
     'write.tasks' = '2',                               
     'write.bucket_assign.tasks' = '2',              
     'write.insert.cluster' = 'true',                  
     'write.ignore.failed' = 'false',           
     'clean.async.enabled' = 'true',    		  
     'clean.retain_commits' = '4',    		      
     'archive.min_commits' = '6',   		        
     'archive.max_commits' = '12',  		        
     'hoodie.cleaner.commits.retained' = '4',
     'hoodie.keep.min.commits' = '5',
     'hoodie.keep.max.commits' = '10'
   ```
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Guanpx commented on issue #4658: [SUPPORT] Data lose with Flink write COW insert table, Flink web UI show Records Received was different with HIVE count(1)

Posted by GitBox <gi...@apache.org>.
Guanpx commented on issue #4658:
URL: https://github.com/apache/hudi/issues/4658#issuecomment-1049571213


   should use  'write.operation' = 'insert';
   but in my code was 'hoodie.datasource.write.operation' = 'insert', hudi will use default config:  'write.operation' = 'upsert', so close this issue.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Guanpx commented on issue #4658: [SUPPORT] Data lose with Flink write COW insert table, Flink web UI show Records Received was different with HIVE count(1)

Posted by GitBox <gi...@apache.org>.
Guanpx commented on issue #4658:
URL: https://github.com/apache/hudi/issues/4658#issuecomment-1018254469


   > So you use the `upsert` mode right ? And the hoodie table has a pk there ?
   
   we use insert (append) mode, not have a unique key, does data will deduplicate?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4658: [SUPPORT] Data lose with Flink write COW insert table, Flink web UI show Records Received was different with HIVE count(1)

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4658:
URL: https://github.com/apache/hudi/issues/4658#issuecomment-1018523491


   Can you try using upsert mode and report back. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on issue #4658: [SUPPORT] Data lose with Flink write COW insert table, Flink web UI show Records Received was different with HIVE count(1)

Posted by GitBox <gi...@apache.org>.
danny0405 commented on issue #4658:
URL: https://github.com/apache/hudi/issues/4658#issuecomment-1018215386


   So you use the `upsert` mode right ? And the hoodie table has a pk there ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Guanpx closed issue #4658: [SUPPORT] Data lose with Flink write COW insert table, Flink web UI show Records Received was different with HIVE count(1)

Posted by GitBox <gi...@apache.org>.
Guanpx closed issue #4658:
URL: https://github.com/apache/hudi/issues/4658


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on issue #4658: [SUPPORT] Data lose with Flink write COW insert table, Flink web UI show Records Received was different with HIVE count(1)

Posted by GitBox <gi...@apache.org>.
danny0405 commented on issue #4658:
URL: https://github.com/apache/hudi/issues/4658#issuecomment-1018507824


   No, data would not be dedupped.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Guanpx commented on issue #4658: [SUPPORT] Data lose with Flink write COW insert table, Flink web UI show Records Received was different with HIVE count(1)

Posted by GitBox <gi...@apache.org>.
Guanpx commented on issue #4658:
URL: https://github.com/apache/hudi/issues/4658#issuecomment-1018603240


   I will try it, but we not have a unique key


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org