You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Marta Kuczora (Jira)" <ji...@apache.org> on 2019/09/04 12:39:00 UTC

[jira] [Updated] (HIVE-21987) Hive is unable to read Parquet int32 annotated with decimal

     [ https://issues.apache.org/jira/browse/HIVE-21987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marta Kuczora updated HIVE-21987:
---------------------------------
    Attachment: HIVE-21987.3.patch

> Hive is unable to read Parquet int32 annotated with decimal
> -----------------------------------------------------------
>
>                 Key: HIVE-21987
>                 URL: https://issues.apache.org/jira/browse/HIVE-21987
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Nandor Kollar
>            Assignee: Marta Kuczora
>            Priority: Major
>         Attachments: HIVE-21987.1.patch, HIVE-21987.2.patch, HIVE-21987.3.patch, part-00000-e5287735-8dcf-4dda-9c6e-4d5c98dc15f2-c000.snappy.parquet
>
>
> When I tried to read a Parquet file from a Hive (with Tez execution engine) table with a small decimal column, I got the following exception:
> {code}
> Caused by: java.lang.UnsupportedOperationException: org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$8$1
> 	at org.apache.parquet.io.api.PrimitiveConverter.addInt(PrimitiveConverter.java:98)
> 	at org.apache.parquet.column.impl.ColumnReaderImpl$2$3.writeValue(ColumnReaderImpl.java:248)
> 	at org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:367)
> 	at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406)
> 	at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226)
> 	... 28 more
> {code}
> Steps to reproduce:
> - Create a Hive table with a single decimal(4, 2) column
> - Create a Parquet file with int32 column annotated with decimal(4, 2) logical type, put it into the previously created table location (or use the attached parquet file, in this case the column should be named as 'd', to match the Hive schema with the Parquet schema in the file)
> - Execute a {{select *}} on this table
> Also, I'm afraid that similar problems can happen with int64 decimals too. [Parquet specification | https://github.com/apache/parquet-format/blob/master/LogicalTypes.md] allows both of these cases.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)