You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Marta Kuczora (Jira)" <ji...@apache.org> on 2020/04/30 14:33:00 UTC

[jira] [Commented] (HIVE-23345) INT64 Parquet timestamps cannot be read into bigint Hive type

    [ https://issues.apache.org/jira/browse/HIVE-23345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096589#comment-17096589 ] 

Marta Kuczora commented on HIVE-23345:
--------------------------------------

In the ETypeConverter when reading a Parquet file, the following logic selects the right converter:
{noformat}
    if (type.getLogicalTypeAnnotation() != null) {
      Optional<PrimitiveConverter> converter = type.getLogicalTypeAnnotation()
          .accept(new LogicalTypeAnnotationVisitor<PrimitiveConverter>() {
            @Override
            public Optional<PrimitiveConverter> visit(DecimalLogicalTypeAnnotation logicalTypeAnnotation) {
              return Optional.of(EDECIMAL_CONVERTER.getConverter(type, index, parent, hiveTypeInfo));
            }

            @Override
            public Optional<PrimitiveConverter> visit(StringLogicalTypeAnnotation logicalTypeAnnotation) {
              return Optional.of(ESTRING_CONVERTER.getConverter(type, index, parent, hiveTypeInfo));
            }

            @Override
            public Optional<PrimitiveConverter> visit(DateLogicalTypeAnnotation logicalTypeAnnotation) {
              return Optional.of(EDATE_CONVERTER.getConverter(type, index, parent, hiveTypeInfo));
            }

            @Override
            public Optional<PrimitiveConverter> visit(TimestampLogicalTypeAnnotation logicalTypeAnnotation) {
              return Optional.of(EINT64_TIMESTAMP_CONVERTER.getConverter(type, index, parent, hiveTypeInfo));
            }
          });

      if (converter.isPresent()) {
        return converter.get();
      }
    }
{noformat}
So if the field has the timestamp annotation, then it will be handled as a timestamp, even though that the underlaying physical type is INT64 and the Hive type is an int. We should extend this logic to consider the hive type when selecting which converter to use.

> INT64 Parquet timestamps cannot be read into bigint Hive type
> -------------------------------------------------------------
>
>                 Key: HIVE-23345
>                 URL: https://issues.apache.org/jira/browse/HIVE-23345
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 4.0.0
>            Reporter: Marta Kuczora
>            Assignee: Marta Kuczora
>            Priority: Major
>
> How to reproduce:
> - create external table ts_pq (ts timestamp) stored as parquet;
> - insert into ts_pq values ('1998-10-03 09:58:31.231');
> - create external table ts_pq_2 (ts bigint) stored as parquet location '<location of ts_pq>';
> - select * from ts_pq_2;
> The following exception occurs during the select:
> Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.TimestampWritableV2 cannot be cast to org.apache.hadoop.io.LongWritable



--
This message was sent by Atlassian Jira
(v8.3.4#803005)