You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Mohit Sabharwal <mo...@cloudera.com> on 2014/09/20 04:34:42 UTC

Review Request 25871: HIVE-8205 : Using strings in group type fails in ParquetSerDe

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25871/
-----------------------------------------------------------

Review request for hive.


Bugs: HIVE-8205
    https://issues.apache.org/jira/browse/HIVE-8205


Repository: hive-git


Description
-------

HIVE-8205 : Using strings in group type fails in ParquetSerDe

In HIVE-7735, schema info was plumbed to ETypeConverter to disambiguate between hive Char,
Varchar and String types, which are all represented as PrimitiveType "binary" and OriginalType
"utf8" in parquet.

However, this does not work for parquet nested types (that map to hive Array, Map, etc.)
containing these values, because schema lookup for nested values was not implemented.
It's also non-trivial to do that in the current parquet serde implementation.

Instead of plumbing in the schema, we should convert these types to the same Text writeable
and let the object inspectors handle the final conversion.

Also, added Map, List and Struct types to parquet_types q-test. Currently, no q-test is
testing these hive types for parquet.


Diffs
-----

  data/files/parquet_types.txt 750626e1d4e3a010f9d231fb01d754c88a12289a 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ArrayWritableGroupConverter.java c5d80f22b82e57c5acf8286d879a248a233aa051 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/DataWritableGroupConverter.java 48e4a133d1b30ef43a53e1a6c19b68682e86835f 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/DataWritableRecordConverter.java 0971a68e151cb1a0469671f119b479719f36fa6a 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java e6fb5ae137a1c91953c2458897d98d109586e9d6 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveGroupConverter.java a364729505eaa7b0b0c9b0c326a8a6398b8b3dbe 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java 5e5df57216d453c643925d3eb0abf593c6d32e2e 
  ql/src/test/queries/clientpositive/parquet_types.q 86af5af40bbb95472d7ef5df6519469cba9a129d 
  ql/src/test/results/clientpositive/parquet_types.q.out 803a826ba0c386af784dd24c0455ac1939af380b 
  serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableHiveCharObjectInspector.java d16e313b43999c5a67e5f30a75d6401058bdd993 
  serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableHiveVarcharObjectInspector.java 28c9080660b9d4c19789ece1754ef4ecec27f2e7 

Diff: https://reviews.apache.org/r/25871/diff/


Testing
-------


Thanks,

Mohit Sabharwal


Re: Review Request 25871: HIVE-8205 : Using strings in group type fails in ParquetSerDe

Posted by Mohit Sabharwal <mo...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25871/
-----------------------------------------------------------

(Updated Sept. 23, 2014, 1:16 a.m.)


Review request for hive.


Changes
-------

Enhanched the q-test, but adding length of columns.
Fixed an issue where HiveChar/VarcharOI should use maxlength.


Bugs: HIVE-8205
    https://issues.apache.org/jira/browse/HIVE-8205


Repository: hive-git


Description
-------

HIVE-8205 : Using strings in group type fails in ParquetSerDe

In HIVE-7735, schema info was plumbed to ETypeConverter to disambiguate between hive Char,
Varchar and String types, which are all represented as PrimitiveType "binary" and OriginalType
"utf8" in parquet.

However, this does not work for parquet nested types (that map to hive Array, Map, etc.)
containing these values, because schema lookup for nested values was not implemented.
It's also non-trivial to do that in the current parquet serde implementation.

Instead of plumbing in the schema, we should convert these types to the same Text writeable
and let the object inspectors handle the final conversion.

Also, added Map, List and Struct types to parquet_types q-test. Currently, no q-test is
testing these hive types for parquet.


Diffs (updated)
-----

  data/files/parquet_types.txt 750626e1d4e3a010f9d231fb01d754c88a12289a 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ArrayWritableGroupConverter.java c5d80f22b82e57c5acf8286d879a248a233aa051 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/DataWritableGroupConverter.java 48e4a133d1b30ef43a53e1a6c19b68682e86835f 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/DataWritableRecordConverter.java 0971a68e151cb1a0469671f119b479719f36fa6a 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java e6fb5ae137a1c91953c2458897d98d109586e9d6 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveGroupConverter.java a364729505eaa7b0b0c9b0c326a8a6398b8b3dbe 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java 5e5df57216d453c643925d3eb0abf593c6d32e2e 
  ql/src/test/queries/clientpositive/parquet_types.q 86af5af40bbb95472d7ef5df6519469cba9a129d 
  ql/src/test/results/clientpositive/parquet_types.q.out 803a826ba0c386af784dd24c0455ac1939af380b 
  serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableHiveCharObjectInspector.java d16e313b43999c5a67e5f30a75d6401058bdd993 
  serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableHiveVarcharObjectInspector.java 28c9080660b9d4c19789ece1754ef4ecec27f2e7 

Diff: https://reviews.apache.org/r/25871/diff/


Testing
-------


Thanks,

Mohit Sabharwal