You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org> on 2018/10/09 19:06:25 UTC

[Impala-ASF-CR] WIP IMPALA-5050: Add support to read TIMESTAMP MILLIS and TIMESTAMP MICROS from Parquet

Hello Zoltan Borok-Nagy, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11057

to look at the new patch set (#8).

Change subject: WIP IMPALA-5050: Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS from Parquet
......................................................................

WIP IMPALA-5050: Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS from Parquet

Changes:
- parquet.thrift is updated to a newer version which contains the timestamp
  logical type.
- INT64 columns with converted types TIMESTAMP_MILLIS and TIMESTAMP_MICROS
  can be read as TIMESTAMP.
- If the logical type is timestamp, then the type will contain the
  information whether the UTC->local conversion is necessary. This
  feature is only supported for the new timestamp types, so INT96
  timestamps must still use flag convert_legacy_hive_parquet_utc_timestamps.

TODOs:
- I plan to rebase once https://gerrit.cloudera.org/#/c/11431/ and
  https://gerrit.cloudera.org/#/c/11183/ are merged and integrate
  UtcFromUnixTimeMillis() + disable stat filtering for UTC-normalized
  INT64 timestamps too
- it would be nice to implement stat filtering for UTC-normalized
  timestamps (I have created IMPALA-7568 to track this)
- tests and correct implamentation were added for dictionary encoding,
  but the changes in dict_encoding.h are quite ugly, so I am thinking
  about a better way to do this
- CREATE TABLE LIKE PARQUET should be also tested with the new
  types

Change-Id: I4c7c01fffa31b3d2ca3480adf6ff851137dadac3
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/parquet-column-stats.inline.h
M be/src/exec/parquet-common.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/util/dict-encoding.h
M common/thrift/parquet.thrift
M fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java
M testdata/data/README
A testdata/data/int64_timestamps_dict.parq
A testdata/data/int64_timestamps_plain.parq
A testdata/workloads/functional-query/queries/QueryTest/parquet-int64-timestamps.test
M tests/query_test/test_scanners.py
13 files changed, 477 insertions(+), 45 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/11057/8
-- 
To view, visit http://gerrit.cloudera.org:8080/11057
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4c7c01fffa31b3d2ca3480adf6ff851137dadac3
Gerrit-Change-Number: 11057
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>