You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org> on 2016/04/08 23:45:28 UTC

[Impala-CR](cdh5-trunk) Fix Parquet timestamp behavior for Hive data

Taras Bobrovytsky has uploaded a new patch set (#10).

Change subject: Fix Parquet timestamp behavior for Hive data
......................................................................

Fix Parquet timestamp behavior for Hive data

- Data written by parquet-mr is adjusted using timezone from table
  property (parquet.mr.int96.write.zone).
- New tables created by Impala set the table property to UTC if the
  global option prevent_parquet_mr_zone_adjustment is enabled

I ran a benchmark query on the release build on my machine on a dataset
that I generated locally. The query looks similar to this:
  SELECT timestamp_col FROM table WHERE timestamp_col is NULL

Before this change:
  Without -convert_legacy_hive_parquet_utc_timestamps
    19.56s
  With -convert_legacy_hive_parquet_utc_timestamps
    293.36s

After this change:
  No timezone set
    17.56s
  No timezone with -convert_legacy_hive_parquet_utc_timestamps
    384.44s
  UTC timezone:
    19.07s
  PST timezone:
    414.51s

Change-Id: I81e8e14d3ec9d399c26756914a54c552757dfbd2
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M common/thrift/Descriptors.thrift
M common/thrift/Frontend.thrift
M fe/src/main/java/com/cloudera/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
M fe/src/main/java/com/cloudera/impala/common/RuntimeEnv.java
M fe/src/main/java/com/cloudera/impala/service/FeSupport.java
M fe/src/main/java/com/cloudera/impala/util/MetaStoreUtil.java
M tests/common/impala_test_suite.py
A tests/custom_cluster/test_parquet_timestamp_compatibility.py
M tests/metadata/test_ddl.py
M tests/metadata/test_hms_integration.py
22 files changed, 580 insertions(+), 126 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/81/1681/10
-- 
To view, visit http://gerrit.cloudera.org:8080/1681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I81e8e14d3ec9d399c26756914a54c552757dfbd2
Gerrit-PatchSet: 10
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>