You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org> on 2020/02/17 14:44:20 UTC

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Hello Quanlong Huang, Norbert Luksa, Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15222

to look at the new patch set (#4).

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................

IMPALA-9385: Unix time conversion cleanup + ORC fix

Orc scanner uses TimestampValue::FromUnixTimeNanos() to convert
sec + nano representation to Impala's TimestampValue (day + nano).
FromUnixTimeNanos was affected by flag
use_local_tz_for_unix_timestamp_conversions, while that global option
should not affect ORC. By default there is was conversion, but if the
flag is 1, then timestamps were interpreted as UTC and converted to
local time.

This could be solved by creating a UTC version of FromUnixTimeNanos,
but I decided to change the interface in the hope of making To/From
timestamp functions less confusing.

Changes:
- Fixed the bug by passing UTC as timezone in the ORC scanner.
- Changed the interface of these TimestampValue functions to expect
  a timezone pointer, interpret null as UTC and skip conversion. It
  would be also possible to pass the actual UTC timezone and check
  for this in the functions, but I guess that it is easier to optimize
  the inlined functions this way.
- Moved the checking of use_local_tz_for_unix_timestamp_conversions to
  RuntimeState and added property time_zone_for_unix_time_conversions()
  to return the timezone to use in Unix time conversions. This made
  TimestampValue's interface clearer and makes it easy to replace the
  flag with a query option if we want to.
- Changed RuntimeState and the Parquet scanner to skip timezone
  conversion if convert_legacy_hive_parquet_utc_timestamps=1 but the
  timezone is UTC. This allows users to avoid the performance penalty
  of this flag by setting query option timezone to UTC in their
  session (IMPALA-7557). CCTZ is not good at this, actually
  conversions are slower with fixed offset timezones (including UTC)
  than with timezones that have DST/historical rule changes.

Postponed changes:
- Didn't remove the UTC versions of the functions yet, as that would
  require changing (and possibly rethinking) several BE tests and
  benchmarks.

Tests:
- Added regression test for Orc and other file formats to
  check that they are not affected by this flag.

Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test
22 files changed, 248 insertions(+), 165 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/15222/4
-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>