You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/10/24 09:59:00 UTC

[jira] [Work logged] (HIVE-26658) INT64 Parquet timestamps cannot be mapped to most Hive numeric types

     [ https://issues.apache.org/jira/browse/HIVE-26658?focusedWorklogId=819571&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-819571 ]

ASF GitHub Bot logged work on HIVE-26658:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 24/Oct/22 09:58
            Start Date: 24/Oct/22 09:58
    Worklog Time Spent: 10m 
      Work Description: zabetak opened a new pull request, #3698:
URL: https://github.com/apache/hive/pull/3698

   ### What changes were proposed in this pull request?
   1. Unify converters from Parquet INT64 to Hive types.
   2. Add tests reading from Parquet INT64 timestamp to various Hive numeric types.
   
   ### Why are the changes needed?
   Restore backward compatibility; allow mapping INT64 timestamps with timestamp annotation to the following Hive numeric types:
   * TINYINT
   * SMALLINT
   * INT
   * DOUBLE
   * FLOAT
   * DECIMAL
   
   For more details see HIVE-26658.
   
   ### Does this PR introduce _any_ user-facing change?
   Avoids errors/exceptions when attempting to map Parquet INT64 with timestamp to anything except TIMESTAMP & BIGINT.
   
   ### How was this patch tested?
   ```
   mvn test -Dtest=TestETypeConverter
   mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=parquet_int64_timestamp_to_numeric.q
   ```




Issue Time Tracking
-------------------

            Worklog Id:     (was: 819571)
    Remaining Estimate: 0h
            Time Spent: 10m

> INT64 Parquet timestamps cannot be mapped to most Hive numeric types
> --------------------------------------------------------------------
>
>                 Key: HIVE-26658
>                 URL: https://issues.apache.org/jira/browse/HIVE-26658
>             Project: Hive
>          Issue Type: Bug
>          Components: Parquet, Serializers/Deserializers
>    Affects Versions: 4.0.0-alpha-1
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Minor
>              Labels: backwards-compatibility
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When attempting to read a Parquet file with column of primitive type INT64 and logical type [TIMESTAMP|https://github.com/apache/parquet-format/blob/54e53e5d7794d383529dd30746378f19a12afd58/LogicalTypes.md?plain=1#L337] an error is raised when the Hive type is different from TIMESTAMP and BIGINT.
> Consider a Parquet file (e.g., ts_file.parquet) with the following schema:
> {code:json}
> {
>   "name": "eventtime",
>   "type": ["null", {
>     "type": "long",
>     "logicalType": "timestamp-millis"
>   }],
>   "default": null
> }
> {code}
>  
> Mapping the column to a Hive numeric type among TINYINT, SMALLINT, INT, FLOAT, DOUBLE, DECIMAL, and trying to run a SELECT will give back an error.
> The following snippet can be used to reproduce the problem.
> {code:sql}
> CREATE TABLE ts_table (eventtime INT) STORED AS PARQUET;
> LOAD DATA LOCAL INPATH 'ts_file.parquet' into table ts_table;
> SELECT * FROM ts_table;
> {code}
> This is a regression caused by HIVE-21215. Although, HIVE-21215 allows to read INT64 types as Hive TIMESTAMP, which was not possible before, at the same time it broke the mapping to every other Hive numeric type. The problem was addressed selectively for BIGINT type very recently (HIVE-26612).
> The primary goal of this ticket is to restore backward compatibility since these use-cases were working before HIVE-21215.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)