You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/18 08:10:56 UTC

[GitHub] [hudi] ChangbingChen edited a comment on issue #4618: [SUPPORT] When querying a hudi table in hive, there have duplicated records.

ChangbingChen edited a comment on issue #4618:
URL: https://github.com/apache/hudi/issues/4618#issuecomment-1015040351


   > @ChangbingChen does parquet files exists in your table? if parquet file exists, pls set mapreduce.input.fileinputformat.split.maxsize >=(maxSize of paruert file) to forbiden hive spliting the parquet file.
   
   Thanks for reply.  It doesn't work. the default value is 256M.
   ```
   hive> set mapreduce.input.fileinputformat.split.maxsize;
   mapreduce.input.fileinputformat.split.maxsize=256000000
   ```
   
   and the maxsize of paruert file is less then 128M.
   ```
   [yarn@x.x.x ~]$ hadoop fs -ls /hudi/mysql_table_sink_new/20220118
   Found 10 items
   -rw-r--r--   3 yarn supergroup    7157103 2022-01-18 11:17 /hudi/mysql_table_sink_new/20220118/.82f164fd-f97d-4691-b9c6-21bea2769be0_20220118111603.log.1_0-1-0
   -rw-r--r--   3 yarn supergroup    7209495 2022-01-18 11:19 /hudi/mysql_table_sink_new/20220118/.82f164fd-f97d-4691-b9c6-21bea2769be0_20220118111759.log.1_0-1-0
   -rw-r--r--   3 yarn supergroup   10402799 2022-01-18 11:21 /hudi/mysql_table_sink_new/20220118/.82f164fd-f97d-4691-b9c6-21bea2769be0_20220118111959.log.1_0-1-0
   -rw-r--r--   3 yarn supergroup    7853954 2022-01-18 11:23 /hudi/mysql_table_sink_new/20220118/.82f164fd-f97d-4691-b9c6-21bea2769be0_20220118112159.log.1_0-1-0
   -rw-r--r--   3 yarn supergroup    4666049 2022-01-18 11:24 /hudi/mysql_table_sink_new/20220118/.82f164fd-f97d-4691-b9c6-21bea2769be0_20220118112359.log.1_0-1-0
   -rw-r--r--   3 yarn supergroup         93 2022-01-18 11:16 /hudi/mysql_table_sink_new/20220118/.hoodie_partition_metadata
   -rw-r--r--   3 yarn supergroup    1541035 2022-01-18 11:19 /hudi/mysql_table_sink_new/20220118/82f164fd-f97d-4691-b9c6-21bea2769be0_0-1-0_20220118111759.parquet
   -rw-r--r--   3 yarn supergroup    2741308 2022-01-18 11:21 /hudi/mysql_table_sink_new/20220118/82f164fd-f97d-4691-b9c6-21bea2769be0_0-1-0_20220118111959.parquet
   -rw-r--r--   3 yarn supergroup    4318101 2022-01-18 11:23 /hudi/mysql_table_sink_new/20220118/82f164fd-f97d-4691-b9c6-21bea2769be0_0-1-0_20220118112159.parquet
   -rw-r--r--   3 yarn supergroup    5585232 2022-01-18 11:25 /hudi/mysql_table_sink_new/20220118/82f164fd-f97d-4691-b9c6-21bea2769be0_0-1-0_20220118112359.parquet
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org