You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sindhu Subhas (Jira)" <ji...@apache.org> on 2022/05/27 18:15:00 UTC

[jira] [Updated] (HIVE-26271) ParquetDecodingException: Can not read value at 1 in block 0 when reading Parquet file generated from ADF sink from Hive

     [ https://issues.apache.org/jira/browse/HIVE-26271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sindhu Subhas updated HIVE-26271:
---------------------------------
    Description: 
Steps to replicate:
 # Create a parquet file using ADF sink from any source. with a decimal column.
 # Move the file to hive external table's ABFS location.
 # Create external table on top of the file.
 # Create ORC table with string column CTAS on parquet external table.

 

Error stack:

Caused by: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file wasb://vvv-2022-05-23t07-57-56-345z@xx.blob.core.windows.net/hive/xx/part-00000-c94f8032-a16b-4314-8868-9fc63a47422e-c000.snappy.parquet
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:422)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
... 27 more
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file wasb://vvv-2022-05-23t07-57-56-345z@xx.blob.core.windows.net/hive/xx/part-00000-c94f8032-a16b-4314-8868-9fc63a47422e-c000.snappy.parquet
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:98)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:60)
at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:93)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:419)
... 28 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo cannot be cast to org.apache.hadoop.hive.serde2.typeinfo.DecimalTypeInfo
at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$8$7.convert(ETypeConverter.java:587)
at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$8$7.convert(ETypeConverter.java:583)
at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$BinaryConverter.addBinary(ETypeConverter.java:792)
at org.apache.parquet.column.impl.ColumnReaderImpl$2$6.writeValue(ColumnReaderImpl.java:317)
at org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:367)
at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406)
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226)
... 33 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1653293742194_0008_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2)

The issue is with the new Parquet reader and Hive decimal support in the new reader doesn't properly implement the Parquet – Hive only handles the {{fixed_len_byte_array}} case in this spec. Some work needs to be done to add support for the rest.

  was:
Steps to replicate:
 # Create a parquet file using ADF sink from any source. with a decimal column.
 # Move the file to hive external table's ABFS location.
 # Create external table on top of the file.
 # Create ORC table with string column CTAS on parquet external table.

 

Error stack:

Caused by: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file wasb://vvv-2022-05-23t07-57-56-345z@xx.blob.core.windows.net/hive/xx/part-00000-c94f8032-a16b-4314-8868-9fc63a47422e-c000.snappy.parquet
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:422)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
... 27 more
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file wasb://vvv-2022-05-23t07-57-56-345z@xx.blob.core.windows.net/hive/xx/part-00000-c94f8032-a16b-4314-8868-9fc63a47422e-c000.snappy.parquet
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:98)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:60)
at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:93)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:419)
... 28 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo cannot be cast to org.apache.hadoop.hive.serde2.typeinfo.DecimalTypeInfo
at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$8$7.convert(ETypeConverter.java:587)
at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$8$7.convert(ETypeConverter.java:583)
at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$BinaryConverter.addBinary(ETypeConverter.java:792)
at org.apache.parquet.column.impl.ColumnReaderImpl$2$6.writeValue(ColumnReaderImpl.java:317)
at org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:367)
at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406)
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226)
... 33 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1653293742194_0008_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2)

The issue is with the new Parquet reader and Hive decimal support in the new reader doesn't properly implement the Parquet [spec|https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md] -- Hive only handles the {{fixed_len_byte_array}} case in this spec. Some work needs to be done to add support for the rest.


> ParquetDecodingException: Can not read value at 1 in block 0 when reading Parquet file generated from ADF sink from Hive
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-26271
>                 URL: https://issues.apache.org/jira/browse/HIVE-26271
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 3.1.1
>         Environment: ADF pipeline to create parquet table.
> HDInsight 4.1
>            Reporter: Sindhu Subhas
>            Priority: Major
>
> Steps to replicate:
>  # Create a parquet file using ADF sink from any source. with a decimal column.
>  # Move the file to hive external table's ABFS location.
>  # Create external table on top of the file.
>  # Create ORC table with string column CTAS on parquet external table.
>  
> Error stack:
> Caused by: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file wasb://vvv-2022-05-23t07-57-56-345z@xx.blob.core.windows.net/hive/xx/part-00000-c94f8032-a16b-4314-8868-9fc63a47422e-c000.snappy.parquet
> at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
> at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
> at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:422)
> at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
> ... 27 more
> Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file wasb://vvv-2022-05-23t07-57-56-345z@xx.blob.core.windows.net/hive/xx/part-00000-c94f8032-a16b-4314-8868-9fc63a47422e-c000.snappy.parquet
> at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
> at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
> at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:98)
> at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:60)
> at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:93)
> at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:419)
> ... 28 more
> Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo cannot be cast to org.apache.hadoop.hive.serde2.typeinfo.DecimalTypeInfo
> at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$8$7.convert(ETypeConverter.java:587)
> at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$8$7.convert(ETypeConverter.java:583)
> at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$BinaryConverter.addBinary(ETypeConverter.java:792)
> at org.apache.parquet.column.impl.ColumnReaderImpl$2$6.writeValue(ColumnReaderImpl.java:317)
> at org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:367)
> at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406)
> at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226)
> ... 33 more
> ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1653293742194_0008_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2)
> The issue is with the new Parquet reader and Hive decimal support in the new reader doesn't properly implement the Parquet – Hive only handles the {{fixed_len_byte_array}} case in this spec. Some work needs to be done to add support for the rest.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)