You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "liwei (Jira)" <ji...@apache.org> on 2020/10/09 16:03:00 UTC
[jira] [Closed] (HUDI-974) Fields out of order in MOR mode when
using Hive
[ https://issues.apache.org/jira/browse/HUDI-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
liwei closed HUDI-974.
----------------------
> Fields out of order in MOR mode when using Hive
> -----------------------------------------------
>
> Key: HUDI-974
> URL: https://issues.apache.org/jira/browse/HUDI-974
> Project: Apache Hudi
> Issue Type: Bug
> Components: Hive Integration
> Reporter: leesf
> Assignee: liwei
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.6.0
>
> Attachments: image-2020-05-28-21-06-02-396.png, image-2020-05-28-21-07-30-803.png
>
>
> When querying MOR hudi dataset via hive
> hive table:
> CREATE EXTERNAL TABLE `unknown_rt`(
> `_hoodie_commit_time` string,
> `_hoodie_commit_seqno` string,
> `_hoodie_record_key` string,
> `_hoodie_partition_path` string,
> `_hoodie_file_name` string,
> `age` bigint,
> `name` string,
> `sex` string,
> `ts` bigint)
> PARTITIONED BY (
> `location` string)
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> STORED AS INPUTFORMAT
> 'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION
> 'file:/Users/sflee/personal/backup_demo'
> TBLPROPERTIES (
> 'last_commit_time_sync'='20200528153331',
> 'transient_lastDdlTime'='1590650733')
>
> sql:
> set hoodie.realtime.merge.skip = true;
> select sex, name, age from unknown_rt;
> result:
> !image-2020-05-28-21-06-02-396.png!
> the fields is out of order when setting hoodie.realtime.merge.skip = true;
> sql:
> set hoodie.realtime.merge.skip = false;
> select sex, name, age from unknown_rt
> !image-2020-05-28-21-07-30-803.png!
> query result is ok when setting hoodie.realtime.merge.skip = false;
> after debugging, I found that hudi use getWriterSchema in RealtimeUnmergedRecordReader instead of getHiveSchema, we need fix it.
>
> cc [~vbalaji]
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)