You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Haifeng Chen (JIRA)" <ji...@apache.org> on 2018/05/10 02:08:00 UTC

[jira] [Commented] (HIVE-19015) Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q gets a ClassCastException

    [ https://issues.apache.org/jira/browse/HIVE-19015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469821#comment-16469821 ] 

Haifeng Chen commented on HIVE-19015:
-------------------------------------

[~vihangk1] This is the same nested complex type problem as it is not yet implement in Parquet vectorized reader. I will get this done together with HIVE-19016. 

The nested complex type handling will be much complex than primitives but fewer cases. My current thought is for root columns which is primitive or List, Struct and Map with primitives, we will go with the current implementation as fast path.  When we found a root column with nested complex types, we will go with a tree reader which can handling the definition level and repetition level properly.

> Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q gets a ClassCastException
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-19015
>                 URL: https://issues.apache.org/jira/browse/HIVE-19015
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 3.0.0
>            Reporter: Matt McCline
>            Assignee: Vihang Karajgaonkar
>            Priority: Critical
>
> Adding "SET hive.vectorized.execution.enabled=true;"  to parquet_map_of_arrays_of_ints.q triggers this call stack:
> {noformat}
> Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo cannot be cast to org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:67) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedMapColumnReader.readBatch(VectorizedMapColumnReader.java:57) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:410) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> {noformat}
> FYI: [~vihangk1]
> Adding parquet_map_of_maps.q, too.  Stack trace seems related.
> {noformat}
> Caused by: java.lang.ClassCastException: optional group value (MAP) {
>   repeated group key_value {
>     optional binary key (UTF8);
>     required int32 value;
>   }
> } is not primitive
> 	at org.apache.parquet.schema.Type.asPrimitiveType(Type.java:213) ~[parquet-hadoop-bundle-1.9.0.jar:1.9.0]
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.BaseVectorizedColumnReader.<init>(BaseVectorizedColumnReader.java:130) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.<init>(VectorizedListColumnReader.java:52) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:568) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)