You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2021/02/09 04:36:00 UTC

[jira] [Created] (IMPALA-10491) Impala parquet scanner should use writer.time.zone when converting Hive timestamps

Tim Armstrong created IMPALA-10491:
--------------------------------------

             Summary: Impala parquet scanner should use writer.time.zone when converting Hive timestamps
                 Key: IMPALA-10491
                 URL: https://issues.apache.org/jira/browse/IMPALA-10491
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
    Affects Versions: Impala 3.4.0
            Reporter: Tim Armstrong


IMPALA-8721 reports some issues with Hive 3 and timezone conversion.

HIVE-21290 fixed some of the issues, and also sets writer.time.zone in the Parquet metadata, which provides a better way to determine how the time zone was written. E.g.

{noformat}
tarmstrong@tarmstrong-Precision-7540:~/impala/impala$ hadoop jar ~/repos/parquet-mr/parquet-tools/target/parquet-tools-1.12.0-SNAPSHOT.jar meta /test-warehouse/asdfgh/000000_0
21/02/08 20:26:44 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5
21/02/08 20:26:44 INFO hadoop.ParquetFileReader: reading another 1 footers
21/02/08 20:26:44 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5
file:        hdfs://localhost:20500/test-warehouse/asdfgh/000000_0
creator:     parquet-mr version 1.10.99.7.2.7.0-44 (build 27344fd5fdaa371e364c604f471b340f8bcf8936)
extra:       writer.date.proleptic = false
extra:       writer.time.zone = America/Los_Angeles
extra:       writer.model.name = 3.1.3000.7.2.7.0-44
{noformat}

We should use this timezone when converting timestamps, I think either always or when convert_legacy_hive_parquet_utc_timestamps=true. 

CC [~boroknagyz] [~csringhofer]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)