You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/07/01 02:41:18 UTC

[GitHub] [hudi] XXwhite opened a new issue, #6019: [SUPPORT]如何让数据尽快的刷新到hudi中呢

XXwhite opened a new issue, #6019:
URL: https://github.com/apache/hudi/issues/6019

   我使用FlinkCDC读取MongoDB的数据写入到hudi中,但是我发现hudi中总数始终比mongo的总数少一部分(几条或者更多),并且很久之后也不会追上,貌似最后一批数据不会马上刷新到hudi中,这可能不是一个问题,但是我想验证数据一致性,请问我该怎么配置,这是我的建表语句,compaction部分的配置貌似没起作用。
   ```
   CREATE TABLE hudi_hesuan_box(
   _id STRING,
   gm STRING,
   kg TIMESTAMP(3),
   kr STRING,
   kj STRING,
   rs INT,
   zt INT,
   sg INT,
   v BIGINT,
   uid STRING,
   s INT,
   xm STRING,
   bg TIMESTAMP(3),
   bj STRING,
   br STRING,
   rlb ARRAY<ROW<_id STRING, md STRING, xm STRING, zj STRING, sj STRING, lx STRING, cj TIMESTAMP(3),s INT>>,
   PRIMARY KEY(_id) NOT ENFORCED 
   )with(
   'connector'='hudi',
   'path'= 'hdfs://cdh07:8020/cdc_test/hudi/hesuan_box',
   'hoodie.datasource.write.recordkey.field'= '_id',
   'hoodie.metadata.enable'='false',
   'write.precombine.field'= 'kg',
   'write.tasks'= '1',
   'write.rate.limit'= '2000',
   'table.type'= 'MERGE_ON_READ' ,
   'compaction.tasks'= '1',
   'compaction.async.enabled'= 'true',
   'compaction.trigger.strategy'= 'num_or_time',
   'compaction.delta_commits'= '1',
   'compaction.delta_seconds'= '120',
   'changelog.enabled'= 'true',
   'read.streaming.enabled'= 'true',
   'read.streaming.check-interval'= '3',
   'hive_sync.enable'= 'true',
   'hive_sync.mode'= 'hms',
   'hive_sync.metastore.uris'= 'thrift://cdh06:9083',
   'hive_sync.jdbc_url'= 'jdbc:hive2://cdh11:10000',
   'hive_sync.table'= 'hesuan_box',
   'hive_sync.db'= 'cdc_test',
   'hive_sync.username'= 'hdfs',
   'hive_sync.support_timestamp'= 'true'
   );
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] sunmingqiaa commented on issue #6019: [SUPPORT]如何让数据尽快的刷新到hudi中呢

Posted by GitBox <gi...@apache.org>.
sunmingqiaa commented on issue #6019:
URL: https://github.com/apache/hudi/issues/6019#issuecomment-1208829498

   It looks like a bug,i meet this too.
   flink 1.13.3 ,hudi 0.11.1, hive 2.1.1
   'compaction.trigger.strategy'= num_or_time
    'compaction.delta_commits'= '1'
   it is normal when there is a small amount of data, but when I with the mysql CDC sink 10 million datas, every time  there will be no compactions file, so,The data queried by the Hive is low


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 closed issue #6019: [SUPPORT]如何让数据尽快的刷新到hudi中呢

Posted by GitBox <gi...@apache.org>.
danny0405 closed issue #6019: [SUPPORT]如何让数据尽快的刷新到hudi中呢
URL: https://github.com/apache/hudi/issues/6019


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] XXwhite closed issue #6019: [SUPPORT]如何让数据尽快的刷新到hudi中呢

Posted by GitBox <gi...@apache.org>.
XXwhite closed issue #6019: [SUPPORT]如何让数据尽快的刷新到hudi中呢
URL: https://github.com/apache/hudi/issues/6019


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xiaozhch5 commented on issue #6019: [SUPPORT]如何让数据尽快的刷新到hudi中呢

Posted by GitBox <gi...@apache.org>.
xiaozhch5 commented on issue #6019:
URL: https://github.com/apache/hudi/issues/6019#issuecomment-1172851108

   If Flink Checkpoints is triggered or the write buffer size is greater than that you set in the options of  ’write.batch.size‘, 256MB by default, the data will be flished into the underneath filesystem.
   You can decrease the checkpoint interval or reduce the value of ’write.batch.size' to flush the data faster


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yuzhaojing commented on issue #6019: [SUPPORT]如何让数据尽快的刷新到hudi中呢

Posted by GitBox <gi...@apache.org>.
yuzhaojing commented on issue #6019:
URL: https://github.com/apache/hudi/issues/6019#issuecomment-1189185741

   @XXwhite Can you provide hudi and flink versions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #6019: [SUPPORT]如何让数据尽快的刷新到hudi中呢

Posted by GitBox <gi...@apache.org>.
danny0405 commented on issue #6019:
URL: https://github.com/apache/hudi/issues/6019#issuecomment-1302935085

   Yeah, let's close it out, use release 0.12.1 then if there are still problems, feel free to re-open it again ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6019: [SUPPORT]如何让数据尽快的刷新到hudi中呢

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6019:
URL: https://github.com/apache/hudi/issues/6019#issuecomment-1302878499

   @yuzhaojing : can we follow up here please. If its already fixed in already released version of hudi, can we close it out. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] XXwhite commented on issue #6019: [SUPPORT]如何让数据尽快的刷新到hudi中呢

Posted by GitBox <gi...@apache.org>.
XXwhite commented on issue #6019:
URL: https://github.com/apache/hudi/issues/6019#issuecomment-1171961891

   @yihua hello can you help me


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] XXwhite commented on issue #6019: [SUPPORT]如何让数据尽快的刷新到hudi中呢

Posted by GitBox <gi...@apache.org>.
XXwhite commented on issue #6019:
URL: https://github.com/apache/hudi/issues/6019#issuecomment-1189752954

   > 
   
   hudi:0.11.1,flink:1.13.6


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] XXwhite commented on issue #6019: [SUPPORT]如何让数据尽快的刷新到hudi中呢

Posted by GitBox <gi...@apache.org>.
XXwhite commented on issue #6019:
URL: https://github.com/apache/hudi/issues/6019#issuecomment-1179917321

   Hudi中数据延迟貌似不是因为Checkpoints,我的表类型是MERGE_ON_READ,我发现hudi并不会按照我的策略('compaction.trigger.strategy'= 'num_or_time')来压缩合并数据,才导致我不能及时看到数据,不知道有没有相关说明?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] XXwhite commented on issue #6019: [SUPPORT]如何让数据尽快的刷新到hudi中呢

Posted by GitBox <gi...@apache.org>.
XXwhite commented on issue #6019:
URL: https://github.com/apache/hudi/issues/6019#issuecomment-1207774345

   > > 
   > 
   > hudi:0.11.1,flink:1.13.6
   
   @yuzhaojing 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org