You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Attila Jeges (JIRA)" <ji...@apache.org> on 2017/05/04 08:18:04 UTC

[jira] [Resolved] (IMPALA-2716) Hive/Impala incompatibility for timestamp data in Parquet

     [ https://issues.apache.org/jira/browse/IMPALA-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Attila Jeges resolved IMPALA-2716.
----------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.9.0

https://github.com/apache/incubator-impala/commit/5803a0b0744ddaee6830d4a1bc8dba8d3f2caa26

commit 5803a0b0744ddaee6830d4a1bc8dba8d3f2caa26
Author: Attila Jeges <at...@cloudera.com>
Date:   Wed Feb 8 19:44:16 2017 +0100

    IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
    
    Before this change:
    Hive adjusts timestamps by subtracting the local time zone's offset
    from all values when writing data to Parquet files. Hive is internally
    inconsistent because it behaves differently for other file formats. As
    a result of this adjustment, Impala may read "incorrect" timestamp
    values from Parquet files written by Hive.
    
    After this change:
    Impala reads Parquet MR timestamp data and adjusts values using a time
    zone from a table property (parquet.mr.int96.write.zone), if set, and
    will not adjust it if the property is absent. No adjustment will be
    applied to data written by Impala.
    
    New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE
    LIKE <file> will set the table property to UTC if the global flag
    --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true.
    
    HDFS tables created by Impala using CREATE TABLE LIKE <other table>
    will copy the property of the table that is copied.
    
    This change also affects the way Impala deals with
    --convert_legacy_hive_parquet_utc_timestamps global flag (introduced
    in IMPALA-1658). The flag will be taken into account only if
    parquet.mr.int96.write.zone table property is not set and ignored
    otherwise.
    
    Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
    Reviewed-on: http://gerrit.cloudera.org:8080/5939
    Reviewed-by: Dan Hecht <dh...@cloudera.com>
    Tested-by: Impala Public Jenkins

> Hive/Impala incompatibility for timestamp data in Parquet
> ---------------------------------------------------------
>
>                 Key: IMPALA-2716
>                 URL: https://issues.apache.org/jira/browse/IMPALA-2716
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.0, Impala 2.1, Impala 2.2, Impala 2.3.0
>            Reporter: Alexander Behm
>            Assignee: Attila Jeges
>            Priority: Critical
>              Labels: incompatibility, parquet
>             Fix For: Impala 2.9.0
>
>
> *Problem*
> Hive adjusts timestamps by subtracting the local time zone’s offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive, and vice versa.
> *Workaround*
> Enable the following compatibility flag in Impala which is false by default.
> --convert_legacy_hive_parquet_utc_timestamps
> When true, TIMESTAMPs read from files written by Parquet-MR (used by Hive) will be converted from UTC to local time. Writes are unaffected.
> For more details, please see IMPALA-1658



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)