You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Dmitry Romanenko (Jira)" <ji...@apache.org> on 2019/10/05 00:52:00 UTC

[jira] [Comment Edited] (HIVE-21987) Hive is unable to read Parquet int32 annotated with decimal

    [ https://issues.apache.org/jira/browse/HIVE-21987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944940#comment-16944940 ] 

Dmitry Romanenko edited comment on HIVE-21987 at 10/5/19 12:51 AM:
-------------------------------------------------------------------

Any chance this will be backported to 3.x tree? This seems like quite major problem affecting multiple trees. The fact that its not available in current release, and may not be available anytime soon (since 4.x. is question of roadmap) is not that satisfying.


was (Author: dimon222):
Any chance this will be backported to 3.x tree? This seems like quite major problem affecting multiple trees.

> Hive is unable to read Parquet int32 annotated with decimal
> -----------------------------------------------------------
>
>                 Key: HIVE-21987
>                 URL: https://issues.apache.org/jira/browse/HIVE-21987
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Nándor Kollár
>            Assignee: Marta Kuczora
>            Priority: Major
>             Fix For: 4.0.0
>
>         Attachments: HIVE-21987.1.patch, HIVE-21987.2.patch, HIVE-21987.3.patch, HIVE-21987.4.patch, HIVE-21987.5.patch, part-00000-e5287735-8dcf-4dda-9c6e-4d5c98dc15f2-c000.snappy.parquet
>
>
> When I tried to read a Parquet file from a Hive (with Tez execution engine) table with a small decimal column, I got the following exception:
> {code}
> Caused by: java.lang.UnsupportedOperationException: org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$8$1
> 	at org.apache.parquet.io.api.PrimitiveConverter.addInt(PrimitiveConverter.java:98)
> 	at org.apache.parquet.column.impl.ColumnReaderImpl$2$3.writeValue(ColumnReaderImpl.java:248)
> 	at org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:367)
> 	at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406)
> 	at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226)
> 	... 28 more
> {code}
> Steps to reproduce:
> - Create a Hive table with a single decimal(4, 2) column
> - Create a Parquet file with int32 column annotated with decimal(4, 2) logical type, put it into the previously created table location (or use the attached parquet file, in this case the column should be named as 'd', to match the Hive schema with the Parquet schema in the file)
> - Execute a {{select *}} on this table
> Also, I'm afraid that similar problems can happen with int64 decimals too. [Parquet specification | https://github.com/apache/parquet-format/blob/master/LogicalTypes.md] allows both of these cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)