You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2021/02/09 04:36:00 UTC
[jira] [Created] (IMPALA-10491) Impala parquet scanner should use
writer.time.zone when converting Hive timestamps
Tim Armstrong created IMPALA-10491:
--------------------------------------
Summary: Impala parquet scanner should use writer.time.zone when converting Hive timestamps
Key: IMPALA-10491
URL: https://issues.apache.org/jira/browse/IMPALA-10491
Project: IMPALA
Issue Type: Improvement
Components: Backend
Affects Versions: Impala 3.4.0
Reporter: Tim Armstrong
IMPALA-8721 reports some issues with Hive 3 and timezone conversion.
HIVE-21290 fixed some of the issues, and also sets writer.time.zone in the Parquet metadata, which provides a better way to determine how the time zone was written. E.g.
{noformat}
tarmstrong@tarmstrong-Precision-7540:~/impala/impala$ hadoop jar ~/repos/parquet-mr/parquet-tools/target/parquet-tools-1.12.0-SNAPSHOT.jar meta /test-warehouse/asdfgh/000000_0
21/02/08 20:26:44 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5
21/02/08 20:26:44 INFO hadoop.ParquetFileReader: reading another 1 footers
21/02/08 20:26:44 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5
file: hdfs://localhost:20500/test-warehouse/asdfgh/000000_0
creator: parquet-mr version 1.10.99.7.2.7.0-44 (build 27344fd5fdaa371e364c604f471b340f8bcf8936)
extra: writer.date.proleptic = false
extra: writer.time.zone = America/Los_Angeles
extra: writer.model.name = 3.1.3000.7.2.7.0-44
{noformat}
We should use this timezone when converting timestamps, I think either always or when convert_legacy_hive_parquet_utc_timestamps=true.
CC [~boroknagyz] [~csringhofer]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)