You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org> on 2016/04/08 23:45:28 UTC
[Impala-CR](cdh5-trunk) Fix Parquet timestamp behavior for Hive data
Taras Bobrovytsky has uploaded a new patch set (#10).
Change subject: Fix Parquet timestamp behavior for Hive data
......................................................................
Fix Parquet timestamp behavior for Hive data
- Data written by parquet-mr is adjusted using timezone from table
property (parquet.mr.int96.write.zone).
- New tables created by Impala set the table property to UTC if the
global option prevent_parquet_mr_zone_adjustment is enabled
I ran a benchmark query on the release build on my machine on a dataset
that I generated locally. The query looks similar to this:
SELECT timestamp_col FROM table WHERE timestamp_col is NULL
Before this change:
Without -convert_legacy_hive_parquet_utc_timestamps
19.56s
With -convert_legacy_hive_parquet_utc_timestamps
293.36s
After this change:
No timezone set
17.56s
No timezone with -convert_legacy_hive_parquet_utc_timestamps
384.44s
UTC timezone:
19.07s
PST timezone:
414.51s
Change-Id: I81e8e14d3ec9d399c26756914a54c552757dfbd2
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M common/thrift/Descriptors.thrift
M common/thrift/Frontend.thrift
M fe/src/main/java/com/cloudera/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
M fe/src/main/java/com/cloudera/impala/common/RuntimeEnv.java
M fe/src/main/java/com/cloudera/impala/service/FeSupport.java
M fe/src/main/java/com/cloudera/impala/util/MetaStoreUtil.java
M tests/common/impala_test_suite.py
A tests/custom_cluster/test_parquet_timestamp_compatibility.py
M tests/metadata/test_ddl.py
M tests/metadata/test_hms_integration.py
22 files changed, 580 insertions(+), 126 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/81/1681/10
--
To view, visit http://gerrit.cloudera.org:8080/1681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I81e8e14d3ec9d399c26756914a54c552757dfbd2
Gerrit-PatchSet: 10
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>