You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Szehon Ho <sz...@cloudera.com> on 2014/02/21 02:41:48 UTC

Re: Review Request 18254: HIVE-6375 Fix CTAS for parquet

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18254/
-----------------------------------------------------------

(Updated Feb. 21, 2014, 1:41 a.m.)


Review request for hive.


Changes
-------

Changing the review name to match the JIRA name change.


Summary (updated)
-----------------

HIVE-6375 Fix CTAS for parquet


Bugs: HIVE-6375
    https://issues.apache.org/jira/browse/HIVE-6375


Repository: hive-git


Description
-------

There is a Hive bug in SemanticAnalyzer that chooses different names for columns in the CreateTable task and the FileSink task.  columnInfo.getInternalName() was used in one place, and fieldSchema still used columnInfo.getAlias() if it is available.  This change makes both consistent, favoring columnInfo.getAlias if it is available.

This is not revealed before because other file-formats like RcFile seem to use column-ordinal position, and Avro file stores the schema separately altogether.


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01aa0e 
  ql/src/test/queries/clientpositive/parquet_ctas.q PRE-CREATION 
  ql/src/test/results/clientpositive/ctas.q.out 9668855 
  ql/src/test/results/clientpositive/ctas_hadoop20.q.out 2c0059d 
  ql/src/test/results/clientpositive/merge3.q.out ae7dc71 
  ql/src/test/results/clientpositive/parquet_ctas.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/18254/diff/


Testing
-------

Added parquet_ctas.q.  Covers cases where column name is gotten directly from input table (implied alias), where name is auto-generated, where name is specified as alias, and a mix of the three.


Thanks,

Szehon Ho