You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/04/28 11:35:01 UTC

[GitHub] [hudi] lihuahui5683 commented on issue #5382: [SUPPORT] org.apache.hudi.hadoop.hive.HoodieCombineRealtimeFileSplit cannot be cast to org.apache.hadoop.hive.shims.HadoopShimsSecure$InputSplitShim

lihuahui5683 commented on issue #5382:
URL: https://github.com/apache/hudi/issues/5382#issuecomment-1112097288

   @codope Thanks for your reply.
   I used CDH6.3.2. I have tried to upgrade Hive 2.3.4 and hive 3.1.2. The same exception occurs in Hive 2.3.4 and Hive 2.1.1. Hive 3.1.2 causes metaStore startup failure.Do you have a way to upgrade the CDH Hive version?
   The _ro table can be used normally.
   
   I used flink CDC(flink-1.13.6) to read mysql to write hudi, and used hive to query hudi _rt table.
   The steps are as follows:
   ```
   ./bin/sql-client.sh
   set sql-client.execution.result-mode=tableau;
   set execution.checkpointing.interval=30sec;
   
   create table role (
     channel_id bigint,
     org_game_id string,
     pf smallint,
     org_server_id string,
     org_user_id string,
     org_role_id string,
     role_name string,
     role_lvl int,
     vip_lvl int,
     role_strength bigint,
     role_create_time timestamp(3),
     first_pay_lvl int,
     first_pay_date bigint,
     pay_money decimal(10,2),
     pay_num bigint,
     last_pay_date bigint,
     last_login_date bigint,
     PRIMARY KEY(channel_id,org_game_id,pf,org_user_id,org_server_id,org_role_id) NOT ENFORCED
   ) WITH (
     'connector'='mysql-cdc',
     'hostname'='192.168.20.76',
     'port'='3306',
     'username'='root',
     'password'='k8U@*hy4icomxz',
     'database-name'='test',
     'table-name'='role_info',
     'server-time-zone'='Asia/Shanghai',
     'scan.startup.mode'='initial',
     'scan.snapshot.fetch.size'='1024',
     'debezium.min.row.count.to.stream.result'='500'
   );
   
   create view role_v as select *, date_format(role_create_time, 'yyyy-MM-dd') as dt from role;
   
   create table role_sync_hive (
     channel_id bigint,
     org_game_id string,
     pf smallint,
     org_server_id string,
     org_user_id string,
     org_role_id string,
     role_name string,
     role_lvl int,
     vip_lvl int,
     role_strength bigint,
     role_create_time timestamp(3),
     first_pay_lvl int,
     first_pay_date bigint,
     pay_money decimal(10,2),
     pay_num bigint,
     last_pay_date bigint,
     last_login_date bigint,
     dt string,
     PRIMARY KEY(channel_id,org_game_id,pf,org_user_id,org_server_id,org_role_id) NOT ENFORCED
   )
   partitioned by (dt, org_game_id)
   with (
     'connector'='hudi',
     'path'='hdfs://mycluster/hudi/role_sync_hive',
     'hoodie.datasource.write.recordkey.field'='channel_id.org_game_id.pf.org_user_id.org_server_id.org_role_id',
     'write.precombine.field'='role_create_time',
     'write.tasks'='4',
     'compaction.tasks'='4',
     'write.rate.limit'='2000',
     'table.type'='MERGE_ON_READ',
     'compaction.async.enabled'='true',
     'compaction.schedule.enabled'='true',
     'compaction.trigger.strategy'='num_commits',
     'compaction.delta_commits'='1',
     'compaction.delta_seconds'='60',
     'changelog.enabled'='true',
     'read.streaming.enabled'='true',
     'read.streaming.check-interval'='3',
     'hive_sync.enable'='true',
     'hive_sync.mode'='hms',
     'hive_sync.metastore.uris' = 'thrift://192.168.20.77:9083',
     'hive_sync.table'='role_sync_hive',
     'hive_sync.db'='hudi',
     'hive_sync.username'='hive',
     'hive_sync.password'='',
     'hive_sync.support_timestamp'='true'
   );
   
   insert into role_sync_hive select channel_id, org_game_id, pf, org_server_id, org_user_id, org_role_id, role_name, role_lvl, vip_lvl, role_strength, role_create_time, first_pay_lvl, first_pay_date, pay_money, pay_num, last_pay_date, last_login_date, dt from role_v;
   ```
   Hive Query statement:
   ```
   add jar hdfs://mycluster/hudi/jars/hudi-hadoop-mr-bundle-0.10.0.jar;
   set hive.input.format = org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat;
   set hoodie.role_sync_hive.consume.mode=INCREMENTAL;
   set hoodie.role_sync_hive.consume.max.commits=3;
   set mapreduce.input.fileinputformat.split.maxsize=128;
   set hive.fetch.task.conversion=none;
   set hoodie.role_sync_hive.consume.start.timestamp=20220420143200507;
   
   select count(*) from role_sync_hive_rt where `_hoodie_commit_time` > '20220420143200507';
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org