You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Matt McCline (JIRA)" <ji...@apache.org> on 2018/03/21 23:27:00 UTC

[jira] [Updated] (HIVE-19015) Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q gets a ClassCastException

     [ https://issues.apache.org/jira/browse/HIVE-19015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt McCline updated HIVE-19015:
--------------------------------
    Description: 
Adding "SET hive.vectorized.execution.enabled=true;"  to parquet_map_of_arrays_of_ints.q triggers this call stack:

{noformat}
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo cannot be cast to org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo
	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:67) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedMapColumnReader.readBatch(VectorizedMapColumnReader.java:57) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:410) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
{noformat}

FYI: [~vihangk1]

Adding parquet_map_of_maps.q, too.  Stack trace seems related.

{noformat}
Caused by: java.lang.ClassCastException: optional group value (MAP) {
  repeated group key_value {
    optional binary key (UTF8);
    required int32 value;
  }
} is not primitive
	at org.apache.parquet.schema.Type.asPrimitiveType(Type.java:213) ~[parquet-hadoop-bundle-1.9.0.jar:1.9.0]
	at org.apache.hadoop.hive.ql.io.parquet.vector.BaseVectorizedColumnReader.<init>(BaseVectorizedColumnReader.java:130) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.<init>(VectorizedListColumnReader.java:52) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:568) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
{noformat}

  was:
Adding "SET hive.vectorized.execution.enabled=true;"  to parquet_map_of_arrays_of_ints.q triggers this call stack:

{noformat}
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo cannot be cast to org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo
	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:67) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedMapColumnReader.readBatch(VectorizedMapColumnReader.java:57) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:410) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
{noformat}

FYI: [~vihangk1]


> Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q gets a ClassCastException
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-19015
>                 URL: https://issues.apache.org/jira/browse/HIVE-19015
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 3.0.0
>            Reporter: Matt McCline
>            Priority: Critical
>
> Adding "SET hive.vectorized.execution.enabled=true;"  to parquet_map_of_arrays_of_ints.q triggers this call stack:
> {noformat}
> Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo cannot be cast to org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:67) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedMapColumnReader.readBatch(VectorizedMapColumnReader.java:57) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:410) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> {noformat}
> FYI: [~vihangk1]
> Adding parquet_map_of_maps.q, too.  Stack trace seems related.
> {noformat}
> Caused by: java.lang.ClassCastException: optional group value (MAP) {
>   repeated group key_value {
>     optional binary key (UTF8);
>     required int32 value;
>   }
> } is not primitive
> 	at org.apache.parquet.schema.Type.asPrimitiveType(Type.java:213) ~[parquet-hadoop-bundle-1.9.0.jar:1.9.0]
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.BaseVectorizedColumnReader.<init>(BaseVectorizedColumnReader.java:130) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.<init>(VectorizedListColumnReader.java:52) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:568) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)