You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Sindhu Subhas (Jira)" <ji...@apache.org> on 2022/05/27 18:12:00 UTC

[jira] [Created] (HIVE-26271) ParquetDecodingException: Can not read value at 1 in block 0 when reading Parquet file generated from ADF sink from Hive

Sindhu Subhas created HIVE-26271:
------------------------------------

             Summary: ParquetDecodingException: Can not read value at 1 in block 0 when reading Parquet file generated from ADF sink from Hive
                 Key: HIVE-26271
                 URL: https://issues.apache.org/jira/browse/HIVE-26271
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 3.1.1
         Environment: ADF pipeline to create parquet table.

HDInsight 4.1
            Reporter: Sindhu Subhas


Steps to replicate:
 # Create a parquet file using ADF sink from any source. with a decimal column.
 # Move the file to hive external table's ABFS location.
 # Create external table on top of the file.
 # Create ORC table with string column CTAS on parquet external table.

 

Error stack:

Caused by: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file wasb://vvv-2022-05-23t07-57-56-345z@xx.blob.core.windows.net/hive/xx/part-00000-c94f8032-a16b-4314-8868-9fc63a47422e-c000.snappy.parquet
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:422)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
... 27 more
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file wasb://vvv-2022-05-23t07-57-56-345z@xx.blob.core.windows.net/hive/xx/part-00000-c94f8032-a16b-4314-8868-9fc63a47422e-c000.snappy.parquet
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:98)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:60)
at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:93)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:419)
... 28 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo cannot be cast to org.apache.hadoop.hive.serde2.typeinfo.DecimalTypeInfo
at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$8$7.convert(ETypeConverter.java:587)
at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$8$7.convert(ETypeConverter.java:583)
at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$BinaryConverter.addBinary(ETypeConverter.java:792)
at org.apache.parquet.column.impl.ColumnReaderImpl$2$6.writeValue(ColumnReaderImpl.java:317)
at org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:367)
at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406)
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226)
... 33 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1653293742194_0008_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2)

The issue is with the new Parquet reader and Hive decimal support in the new reader doesn't properly implement the Parquet [spec|https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md] -- Hive only handles the {{fixed_len_byte_array}} case in this spec. Some work needs to be done to add support for the rest.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)