You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Szehon Ho <sz...@cloudera.com> on 2014/02/21 02:41:48 UTC
Re: Review Request 18254: HIVE-6375 Fix CTAS for parquet
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18254/
-----------------------------------------------------------
(Updated Feb. 21, 2014, 1:41 a.m.)
Review request for hive.
Changes
-------
Changing the review name to match the JIRA name change.
Summary (updated)
-----------------
HIVE-6375 Fix CTAS for parquet
Bugs: HIVE-6375
https://issues.apache.org/jira/browse/HIVE-6375
Repository: hive-git
Description
-------
There is a Hive bug in SemanticAnalyzer that chooses different names for columns in the CreateTable task and the FileSink task. columnInfo.getInternalName() was used in one place, and fieldSchema still used columnInfo.getAlias() if it is available. This change makes both consistent, favoring columnInfo.getAlias if it is available.
This is not revealed before because other file-formats like RcFile seem to use column-ordinal position, and Avro file stores the schema separately altogether.
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01aa0e
ql/src/test/queries/clientpositive/parquet_ctas.q PRE-CREATION
ql/src/test/results/clientpositive/ctas.q.out 9668855
ql/src/test/results/clientpositive/ctas_hadoop20.q.out 2c0059d
ql/src/test/results/clientpositive/merge3.q.out ae7dc71
ql/src/test/results/clientpositive/parquet_ctas.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/18254/diff/
Testing
-------
Added parquet_ctas.q. Covers cases where column name is gotten directly from input table (implied alias), where name is auto-generated, where name is specified as alias, and a mix of the three.
Thanks,
Szehon Ho