You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org> on 2020/02/14 17:57:24 UTC

[Impala-ASF-CR] WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion

Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15222


Change subject: WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion
......................................................................

WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion

Orc scanner uses TimestampValue::FromUnixTimeNanos() to convert
sec + nano representation to Impala's TimestampValue (day + nano).
FromUnixTimeNanos is affected by flag
use_local_tz_for_unix_timestamp_conversions, while that global option
should not affect Orc. By default there is no conversion, but if the
flag is 1, then timestamps are interpreted as UTC and converted to
local time.

This could be solved by creating a UTC version of FromUnixTimeNanos,
but I decided to change the interface in the hope of meking To/From
timestamp functions less confusing.

Changes so far:
- Fixed the bug by passing UTC as timezone in the Orc scanner.
- Changed the interface of these TimestampValue functions to expect
  a timezone pointer, interpret null as UTC and skip conversion. It
  would be also possible to pass the actual UTC timezone and check
  for this in the functions, but I guess that it is easier to opimize
  the inlined functions this way.
- Changed RuntimeState and the Parquet scanner to skip timezone
  conversion if convert_legacy_hive_parquet_utc_timestamps=1 but the
  timezone is UTC. This allows users to avoid the performance penalty
  of this flag by setting query option timezone to UTC in their
  session (IMPALA-7557). CCTZ doesn't is not good at this, actually
  conversions are slower with fixed offset timezones (including UTC)
  than timezones with DST/historical rule changes.

Further planned changes:
- Remove the UTC versions of the functions, as they shouldn't be
  faster than calling the converting functions with null after
  inlineing.
- Move out the checking of use_local_tz_for_unix_timestamp_conversions
  to RuntimeState. The plan is to create a function like
  Timezone* local_timezone_for_unix_time_conversions() and return
  null or the local timezone based on the flag. I think that this
  would make the intention of the callsites of From/To functions
  much clearer.
- Add/update comments.

Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
18 files changed, 111 insertions(+), 113 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/15222/1
-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15222/7/be/src/exec/data-source-scan-node.cc
File be/src/exec/data-source-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/15222/7/be/src/exec/data-source-scan-node.cc@352
PS7, Line 352:         // TODO The timezone depends on flag use_local_tz_for_unix_timestamp_conversions.
             :         //      Check if this is the intended behaviour.
             :         RETURN_IF_ERROR(MaterializeNextRow(
             :             state->time_zone_for_unix_time_conversions(), tuple_pool, tuple));
> Yes, you're correct, it should be UTCPTR.
I didn't want to change our default behavior. External data sources are used by very few people (if any), but I do not want to break their existing workloads.


http://gerrit.cloudera.org:8080/#/c/15222/6/testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test
File testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test:

http://gerrit.cloudera.org:8080/#/c/15222/6/testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test@18
PS6, Line 18: ====
            : ---- QUERY
> Done
Oops. I didn't upload the .test file, which lead to the errors during the jenkins test. Uploaded it in PS11



-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 20 Feb 2020 13:02:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 13: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5386/


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 21 Feb 2020 19:23:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 4:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/5240/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 17 Feb 2020 15:24:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15222

to look at the new patch set (#3).

Change subject: WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion
......................................................................

WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion

Orc scanner uses TimestampValue::FromUnixTimeNanos() to convert
sec + nano representation to Impala's TimestampValue (day + nano).
FromUnixTimeNanos is affected by flag
use_local_tz_for_unix_timestamp_conversions, while that global option
should not affect Orc. By default there is no conversion, but if the
flag is 1, then timestamps are interpreted as UTC and converted to
local time.

This could be solved by creating a UTC version of FromUnixTimeNanos,
but I decided to change the interface in the hope of making To/From
timestamp functions less confusing.

Changes so far:
- Fixed the bug by passing UTC as timezone in the Orc scanner.
- Changed the interface of these TimestampValue functions to expect
  a timezone pointer, interpret null as UTC and skip conversion. It
  would be also possible to pass the actual UTC timezone and check
  for this in the functions, but I guess that it is easier to optimize
  the inlined functions this way.
- Changed RuntimeState and the Parquet scanner to skip timezone
  conversion if convert_legacy_hive_parquet_utc_timestamps=1 but the
  timezone is UTC. This allows users to avoid the performance penalty
  of this flag by setting query option timezone to UTC in their
  session (IMPALA-7557). CCTZ is not good at this, actually
  conversions are slower with fixed offset timezones (including UTC)
  than with timezones that have DST/historical rule changes.

Further planned changes:
- Remove the UTC versions of the functions, as they shouldn't be
  faster than calling the converting functions with null after
  inlining.
- Move out the checking of use_local_tz_for_unix_timestamp_conversions
  to RuntimeState. The plan is to create a function like
  Timezone* local_timezone_for_unix_time_conversions() and return
  null or the local timezone based on the flag. I think that this
  would make the intention of the callsites of From/To functions
  much clearer.
- Add/update comments.

Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
18 files changed, 111 insertions(+), 113 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/15222/3
-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 11: Code-Review+1

(3 comments)

Left some comments, otherwise it LGTM.
Will you open a Jira for the postponed changes/TODOs?

http://gerrit.cloudera.org:8080/#/c/15222/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15222/11//COMMIT_MSG@9
PS11, Line 9: Orc
nit: ORC


http://gerrit.cloudera.org:8080/#/c/15222/11/be/src/common/global-types.h
File be/src/common/global-types.h:

http://gerrit.cloudera.org:8080/#/c/15222/11/be/src/common/global-types.h@33
PS11, Line 33: UTCPTR
Since it's not a macro anymore, maybe UtcPtr would be a better name?


http://gerrit.cloudera.org:8080/#/c/15222/11/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

http://gerrit.cloudera.org:8080/#/c/15222/11/be/src/runtime/timestamp-value.h@a35
PS11, Line 35: 
             : 
Shouldn't we keep this comment?



-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 20 Feb 2020 16:28:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Norbert Luksa, Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15222

to look at the new patch set (#4).

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................

IMPALA-9385: Unix time conversion cleanup + ORC fix

Orc scanner uses TimestampValue::FromUnixTimeNanos() to convert
sec + nano representation to Impala's TimestampValue (day + nano).
FromUnixTimeNanos was affected by flag
use_local_tz_for_unix_timestamp_conversions, while that global option
should not affect ORC. By default there is was conversion, but if the
flag is 1, then timestamps were interpreted as UTC and converted to
local time.

This could be solved by creating a UTC version of FromUnixTimeNanos,
but I decided to change the interface in the hope of making To/From
timestamp functions less confusing.

Changes:
- Fixed the bug by passing UTC as timezone in the ORC scanner.
- Changed the interface of these TimestampValue functions to expect
  a timezone pointer, interpret null as UTC and skip conversion. It
  would be also possible to pass the actual UTC timezone and check
  for this in the functions, but I guess that it is easier to optimize
  the inlined functions this way.
- Moved the checking of use_local_tz_for_unix_timestamp_conversions to
  RuntimeState and added property time_zone_for_unix_time_conversions()
  to return the timezone to use in Unix time conversions. This made
  TimestampValue's interface clearer and makes it easy to replace the
  flag with a query option if we want to.
- Changed RuntimeState and the Parquet scanner to skip timezone
  conversion if convert_legacy_hive_parquet_utc_timestamps=1 but the
  timezone is UTC. This allows users to avoid the performance penalty
  of this flag by setting query option timezone to UTC in their
  session (IMPALA-7557). CCTZ is not good at this, actually
  conversions are slower with fixed offset timezones (including UTC)
  than with timezones that have DST/historical rule changes.

Postponed changes:
- Didn't remove the UTC versions of the functions yet, as that would
  require changing (and possibly rethinking) several BE tests and
  benchmarks.

Tests:
- Added regression test for Orc and other file formats to
  check that they are not affected by this flag.

Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test
22 files changed, 248 insertions(+), 165 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/15222/4
-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5237/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 17 Feb 2020 12:18:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5258/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 18 Feb 2020 21:11:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Norbert Luksa, Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15222

to look at the new patch set (#11).

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................

IMPALA-9385: Unix time conversion cleanup + ORC fix

Orc scanner uses TimestampValue::FromUnixTimeNanos() to convert
sec + nano representation to Impala's TimestampValue (day + nano).
FromUnixTimeNanos was affected by flag
use_local_tz_for_unix_timestamp_conversions, while that global option
should not affect ORC. By default there was no conversion, but if the
flag is 1, then timestamps were interpreted as UTC and converted to
local time.

This could be solved by creating a UTC version of FromUnixTimeNanos,
but I decided to change the interface in the hope of making To/From
timestamp functions less confusing.

Changes:
- Fixed the bug by passing UTC as timezone in the ORC scanner.
- Changed the interface of these TimestampValue functions to expect
  a timezone pointer, interpret null as UTC and skip conversion. It
  would be also possible to pass the actual UTC timezone and check
  for this in the functions, but I guess it is easier to optimize
  the inlined functions this way.
- Moved the checking of use_local_tz_for_unix_timestamp_conversions to
  RuntimeState and added property time_zone_for_unix_time_conversions()
  to return the timezone to use in Unix time conversions. This made
  TimestampValue's interface clearer and makes it easy to replace the
  flag with a query option if we want to.
- Changed RuntimeState and the Parquet scanner to skip timezone
  conversion if convert_legacy_hive_parquet_utc_timestamps=1 but the
  timezone is UTC. This allows users to avoid the performance penalty
  of this flag by setting query option timezone to UTC in their
  session (IMPALA-7557). CCTZ is not good at this, actually
  conversions are slower with fixed offset timezones (including UTC)
  than with timezones that have DST/historical rule changes.

Postponed changes:
- Didn't remove the UTC versions of the functions yet, as that would
  require changing (and possibly rethinking) several BE tests and
  benchmarks.

Tests:
- Added regression test for Orc and other file formats to
  check that they are not affected by this flag.
- Extended test_hive_parquet_timestamp_conversion.py to cover the case
  when convert_legacy_hive_parquet_utc_timestamps=1 and timezone=UTC.
  Also did some cleanup there to use query option timezone instead of
  env var TZ.

Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
A testdata/workloads/functional-query/queries/QueryTest/file-formats-with-local-tz-conversion.test
M testdata/workloads/functional-query/queries/QueryTest/local-timestamp-functions.test
M testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-local-tz-conversion.test
M testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/custom_cluster/test_local_tz_conversion.py
27 files changed, 288 insertions(+), 209 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/15222/11
-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 6:

(10 comments)

Thanks for the review!
I fixed the ones I were sure about, will return to some of the more complex ones.

http://gerrit.cloudera.org:8080/#/c/15222/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15222/6//COMMIT_MSG@13
PS6, Line 13: is was
> was no ?
Done


http://gerrit.cloudera.org:8080/#/c/15222/6//COMMIT_MSG@26
PS6, Line 26: that
> nit: not necessary
Done


http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/common/global-types.h
File be/src/common/global-types.h:

http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/common/global-types.h@31
PS6, Line 31: #define UTCPTR nullptr
> Why no define a const Timezone* instead?
It led to compile error for some reason. I will try it again, probably it should be const Timezone * const


http://gerrit.cloudera.org:8080/#/c/15222/7/be/src/exec/data-source-scan-node.cc
File be/src/exec/data-source-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/15222/7/be/src/exec/data-source-scan-node.cc@352
PS7, Line 352:         // TODO The timezone depends on flag use_local_tz_for_unix_timestamp_conversions.
             :         //      Check if this is the intended behaviour.
             :         RETURN_IF_ERROR(MaterializeNextRow(
             :             state->time_zone_for_unix_time_conversions(), tuple_pool, tuple));
> You're raising a good point in the comment. I think here we should just pas
I was thinking about UTCPTR instead. Using local_time_zone() would mean that we will do a timezone conversion by default, which is not how Impala used to work.

Note that I am not sure whether anyone cares about the correctness of DataSource.


http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/exprs/expr-test.cc@170
PS6, Line 170: const Timezone&
> Returning const Timezone* here might simplify things a bit.
This cannot return nullptr, as it is also used in UtcToLocal() which expects a Timezone reference, so I think that it is clearer to keep it this way.


http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/runtime/timestamp-value.h@102
PS6, Line 102: 'unix_time' is assumed to be UTC
> nit: The comment is a bit confusing. 'unix_time' is always in UTC (by defin
I didn't find a good name for this offset when used in timezone agnostic context. E.g. both Parquet and Orc store such offsets that behave exactly like Unix time, while assume the resulting timestamp to be in local time. We consider this the "normal" case and pass a pointer to a Timezone when the unix_time is real UTC Unix time and we need to convert it to local time.


http://gerrit.cloudera.org:8080/#/c/15222/6/testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-local-tz-conversion.test
File testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-local-tz-conversion.test:

http://gerrit.cloudera.org:8080/#/c/15222/6/testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-local-tz-conversion.test@3
PS6, Line 3: This test is also called with convert_legacy_hive_parquet_utc_timestamps=true.
> Comment is confusing: I think the test is only called with convert_legacy_h
You are right, I confused this test with another one.


http://gerrit.cloudera.org:8080/#/c/15222/6/testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test
File testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test:

http://gerrit.cloudera.org:8080/#/c/15222/6/testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test@18
PS6, Line 18: ====
            : ---- QUERY
> Move the new sections to a separate .test file or rename this file to somet
Done


http://gerrit.cloudera.org:8080/#/c/15222/6/testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test@45
PS6, Line 45: ---- QUERY
            : SET timezone=CET;
            : select min(timestamp_col) from functional_avro.alltypestiny;
            : ---- TYPES
            : STRING
            : ---- RESULTS
            : '2009-01-01 00:00:00'
> Since functional_avro.alltypestiny.timestamp_col is a string so probably yo
I kept the test - if we add timestamp support for avro one day, it will be easy to forget about checking this.


http://gerrit.cloudera.org:8080/#/c/15222/6/tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
File tests/custom_cluster/test_hive_parquet_timestamp_conversion.py:

http://gerrit.cloudera.org:8080/#/c/15222/6/tests/custom_cluster/test_hive_parquet_timestamp_conversion.py@74
PS6, Line 74:     self.check_sanity(True)
            :     # Test with UTC too to check the optimizations added in IMPALA-9385.
            :     for tz_name in ["PST8PDT", "UTC"]:
            :       # The value read from the Hive table should be the same as reading a UTC converted
            :       # value from the Impala table.
            :       data = self.execute_query_expect_success(self.client, """
            :           SELECT h.id, h.day, h.timestamp_col, i.timestamp_col
            :           FROM functional_parquet.alltypesagg_hive_13_1 h
            :           JOIN functional_parquet.alltypesagg
            :             i ON i.id = h.id AND i.day = h.day  -- serves as a unique key
            :           WHERE
            :             (h.timestamp_col IS NULL AND i.timestamp_col IS NOT NULL)
            :             OR (h.timestamp_col IS NOT NULL AND i.timestamp_col IS NULL)
            :             OR h.timestamp_col != FROM_UTC_TIMESTAMP(i.timestamp_col, '%s')
            :           """ % tz_name, query_options={"timezone": tz_name})\
            :           .get_data()
            :       assert len(data) == 0
> Any reason you moved this block after L73?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 18 Feb 2020 20:43:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion
......................................................................


Patch Set 2:

(5 comments)

Hi, left some minor comments.

http://gerrit.cloudera.org:8080/#/c/15222/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15222/2//COMMIT_MSG@7
PS2, Line 7: Orc
nit: ORC (here and below)


http://gerrit.cloudera.org:8080/#/c/15222/2//COMMIT_MSG@36
PS2, Line 36: Further planned changes
Is it possible to gain some performance with this change? If yes, adding some measurements would be nice, I think.


http://gerrit.cloudera.org:8080/#/c/15222/2/be/src/exec/parquet/parquet-common.h
File be/src/exec/parquet/parquet-common.h:

http://gerrit.cloudera.org:8080/#/c/15222/2/be/src/exec/parquet/parquet-common.h@743
PS2, Line 743: is
nit: typo


http://gerrit.cloudera.org:8080/#/c/15222/2/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

http://gerrit.cloudera.org:8080/#/c/15222/2/be/src/runtime/runtime-state.cc@150
PS2, Line 150: nullptr
nit: maybe it would be worth to mention here as a side note that UTCPTR is in fact nullptr


http://gerrit.cloudera.org:8080/#/c/15222/2/be/src/runtime/runtime-state.cc@152
PS2, Line 152:       local_time_zone_ = tz == &TimezoneDatabase::GetUtcTimezone() ? UTCPTR : tz;
Not related to this point, but as I see, the only other place (outside of some tests) you left GetUtcTimezone() in the code is in ImpalaServer::PrepareQueryContext. Wouldn't it make sense to refactor the code there as welll?



-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 17 Feb 2020 13:06:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 12: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 12
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 21 Feb 2020 13:45:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 9: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5365/


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 19 Feb 2020 20:28:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 7:

PS 7 fixed the clang tidy compile issue. What broke the last code review check is something unrelated:
00:22:31.237 [ERROR] Failed to execute goal on project impala-minimal-hive-exec: Could not resolve dependencies for project org.apache.impala:impala-minimal-hive-exec:jar:0.1-SNAPSHOT: Failed to collect dependencies at org.apache.hive:hive-exec:jar:2.1.1-cdh6.x-SNAPSHOT -> org.apache.calcite:calcite-core:jar:1.12.0 -> org.slf4j:slf4j-api:jar:1.7.13: Failed to read artifact descriptor for org.slf4j:slf4j-api:jar:1.7.13: Could not transfer artifact org.slf4j:slf4j-parent:pom:1.7.13 from/to impala.cdp.repo (https://native-toolchain.s3.amazonaws.com/build/cdp_components/1617729/maven): Access denied to: https://native-toolchain.s3.amazonaws.com/build/cdp_components/1617729/maven/org/slf4j/slf4j-parent/1.7.13/slf4j-parent-1.7.13.pom , ReasonPhrase:Forbidden. -> [Help 1]
00:22:31.237 [ERROR]


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 18 Feb 2020 15:56:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5365/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 19 Feb 2020 16:06:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 1:

(7 comments)

Thanks for looking at the patch!
Aside from fixing some of the comments, I also did several other changes (they are also listed in the commit message).

http://gerrit.cloudera.org:8080/#/c/15222/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15222/2//COMMIT_MSG@7
PS2, Line 7: Orc
> nit: ORC (here and below)
Done


http://gerrit.cloudera.org:8080/#/c/15222/2//COMMIT_MSG@36
PS2, Line 36: Further planned changes
> Is it possible to gain some performance with this change? If yes, adding so
The performance impact of this change is that if use_local_tz_for_unix_timestamp_conversions or convert_legacy_hive_parquet_utc_timestamps are true and the query option timezone is UTC, then affected functions should be much faster. convert-timestamp-benchmark.cc can give some hints about the difference between different conversions' UTC and non-UTC versions (it is in the range of 3x-20x). The difference during Parquet scanning can be even larger.

I plan to update the benchmark when I remove the UTC versions of the From/To functions in TimestampValue.


http://gerrit.cloudera.org:8080/#/c/15222/2/be/src/exec/parquet/parquet-common.h
File be/src/exec/parquet/parquet-common.h:

http://gerrit.cloudera.org:8080/#/c/15222/2/be/src/exec/parquet/parquet-common.h@743
PS2, Line 743: is
> nit: typo
Done


http://gerrit.cloudera.org:8080/#/c/15222/4/be/src/exprs/timezone_db.h
File be/src/exprs/timezone_db.h:

http://gerrit.cloudera.org:8080/#/c/15222/4/be/src/exprs/timezone_db.h@77
PS4, Line 77: 
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/15222/4/be/src/exprs/timezone_db.h@79
PS4, Line 79:   static Status LoadZoneInfoBeTestOnly(
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/15222/2/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

http://gerrit.cloudera.org:8080/#/c/15222/2/be/src/runtime/runtime-state.cc@150
PS2, Line 150: nullptr
> nit: maybe it would be worth to mention here as a side note that UTCPTR is 
Done


http://gerrit.cloudera.org:8080/#/c/15222/2/be/src/runtime/runtime-state.cc@152
PS2, Line 152:       local_time_zone_ = tz == &TimezoneDatabase::GetUtcTimezone() ? UTCPTR : tz;
> Not related to this point, but as I see, the only other place (outside of s
Using nullptr instead of a real Timezone object is only useful in performance critical parts - mainly those function that can run for every row of a query. I don't want to remove TimezoneDatabase::GetUtcTimezone(), as it can be useful to have the Timezone object, e.g. to get it's name.



-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 17 Feb 2020 15:19:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15222/11/be/src/common/global-types.h
File be/src/common/global-types.h:

http://gerrit.cloudera.org:8080/#/c/15222/11/be/src/common/global-types.h@33
PS11, Line 33: UTCPTR
> I prefer it all capital for some reason, not sure why :)
I'm okay with all capital, too, just thought the style guide requires it in CamelCase.



-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 21 Feb 2020 09:51:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 6:

I realized that Parquet timezone conversion was untested when the timezone is UTC. PS6 adds coverage for that.


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 17 Feb 2020 16:50:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5270/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 19 Feb 2020 16:42:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion
......................................................................


Patch Set 2:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/5226/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 14 Feb 2020 18:57:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Norbert Luksa, Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15222

to look at the new patch set (#5).

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................

IMPALA-9385: Unix time conversion cleanup + ORC fix

Orc scanner uses TimestampValue::FromUnixTimeNanos() to convert
sec + nano representation to Impala's TimestampValue (day + nano).
FromUnixTimeNanos was affected by flag
use_local_tz_for_unix_timestamp_conversions, while that global option
should not affect ORC. By default there is was conversion, but if the
flag is 1, then timestamps were interpreted as UTC and converted to
local time.

This could be solved by creating a UTC version of FromUnixTimeNanos,
but I decided to change the interface in the hope of making To/From
timestamp functions less confusing.

Changes:
- Fixed the bug by passing UTC as timezone in the ORC scanner.
- Changed the interface of these TimestampValue functions to expect
  a timezone pointer, interpret null as UTC and skip conversion. It
  would be also possible to pass the actual UTC timezone and check
  for this in the functions, but I guess that it is easier to optimize
  the inlined functions this way.
- Moved the checking of use_local_tz_for_unix_timestamp_conversions to
  RuntimeState and added property time_zone_for_unix_time_conversions()
  to return the timezone to use in Unix time conversions. This made
  TimestampValue's interface clearer and makes it easy to replace the
  flag with a query option if we want to.
- Changed RuntimeState and the Parquet scanner to skip timezone
  conversion if convert_legacy_hive_parquet_utc_timestamps=1 but the
  timezone is UTC. This allows users to avoid the performance penalty
  of this flag by setting query option timezone to UTC in their
  session (IMPALA-7557). CCTZ is not good at this, actually
  conversions are slower with fixed offset timezones (including UTC)
  than with timezones that have DST/historical rule changes.

Postponed changes:
- Didn't remove the UTC versions of the functions yet, as that would
  require changing (and possibly rethinking) several BE tests and
  benchmarks.

Tests:
- Added regression test for Orc and other file formats to
  check that they are not affected by this flag.

Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test
22 files changed, 248 insertions(+), 165 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/15222/5
-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Norbert Luksa, Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15222

to look at the new patch set (#6).

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................

IMPALA-9385: Unix time conversion cleanup + ORC fix

Orc scanner uses TimestampValue::FromUnixTimeNanos() to convert
sec + nano representation to Impala's TimestampValue (day + nano).
FromUnixTimeNanos was affected by flag
use_local_tz_for_unix_timestamp_conversions, while that global option
should not affect ORC. By default there is was conversion, but if the
flag is 1, then timestamps were interpreted as UTC and converted to
local time.

This could be solved by creating a UTC version of FromUnixTimeNanos,
but I decided to change the interface in the hope of making To/From
timestamp functions less confusing.

Changes:
- Fixed the bug by passing UTC as timezone in the ORC scanner.
- Changed the interface of these TimestampValue functions to expect
  a timezone pointer, interpret null as UTC and skip conversion. It
  would be also possible to pass the actual UTC timezone and check
  for this in the functions, but I guess that it is easier to optimize
  the inlined functions this way.
- Moved the checking of use_local_tz_for_unix_timestamp_conversions to
  RuntimeState and added property time_zone_for_unix_time_conversions()
  to return the timezone to use in Unix time conversions. This made
  TimestampValue's interface clearer and makes it easy to replace the
  flag with a query option if we want to.
- Changed RuntimeState and the Parquet scanner to skip timezone
  conversion if convert_legacy_hive_parquet_utc_timestamps=1 but the
  timezone is UTC. This allows users to avoid the performance penalty
  of this flag by setting query option timezone to UTC in their
  session (IMPALA-7557). CCTZ is not good at this, actually
  conversions are slower with fixed offset timezones (including UTC)
  than with timezones that have DST/historical rule changes.

Postponed changes:
- Didn't remove the UTC versions of the functions yet, as that would
  require changing (and possibly rethinking) several BE tests and
  benchmarks.

Tests:
- Added regression test for Orc and other file formats to
  check that they are not affected by this flag.
- Extended test_hive_parquet_timestamp_conversion.py to cover the case
  when convert_legacy_hive_parquet_utc_timestamps=1 and timezone=UTC.
  Also did some cleanup there to use query option timezone instead of
  env var TZ.

Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-local-tz-conversion.test
M testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
24 files changed, 279 insertions(+), 209 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/15222/6
-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion
......................................................................


Patch Set 2: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5341/


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 14 Feb 2020 23:03:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 6:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/5243/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 17 Feb 2020 17:19:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/common/global-types.h
File be/src/common/global-types.h:

http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/common/global-types.h@31
PS6, Line 31: #define UTCPTR nullptr
> It led to compile error for some reason. I will try it again, probably it s
Thanks for the suggestion, finally it compiled with constexpr. It is much better with type safety.



-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 19 Feb 2020 16:01:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 7:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/15222/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15222/6//COMMIT_MSG@13
PS6, Line 13: is was
was no ?


http://gerrit.cloudera.org:8080/#/c/15222/6//COMMIT_MSG@26
PS6, Line 26: that
nit: not necessary


http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/common/global-types.h
File be/src/common/global-types.h:

http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/common/global-types.h@31
PS6, Line 31: #define UTCPTR nullptr
Why no define a const Timezone* instead?


http://gerrit.cloudera.org:8080/#/c/15222/7/be/src/exec/data-source-scan-node.cc
File be/src/exec/data-source-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/15222/7/be/src/exec/data-source-scan-node.cc@352
PS7, Line 352:         // TODO The timezone depends on flag use_local_tz_for_unix_timestamp_conversions.
             :         //      Check if this is the intended behaviour.
             :         RETURN_IF_ERROR(MaterializeNextRow(
             :             state->time_zone_for_unix_time_conversions(), tuple_pool, tuple));
You're raising a good point in the comment. I think here we should just pass state->local_time_zone().


http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/exprs/expr-test.cc@170
PS6, Line 170: 
Returning const Timezone* here might simplify things a bit.


http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/runtime/timestamp-value.h@102
PS6, Line 102: 'unix_time' is assumed to be UTC
nit: The comment is a bit confusing. 'unix_time' is always in UTC (by definition) not just when 'local_tz' is set to non-UTC.


http://gerrit.cloudera.org:8080/#/c/15222/6/testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-local-tz-conversion.test
File testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-local-tz-conversion.test:

http://gerrit.cloudera.org:8080/#/c/15222/6/testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-local-tz-conversion.test@3
PS6, Line 3: This test is also called with convert_legacy_hive_parquet_utc_timestamps=true.
Comment is confusing: I think the test is only called with convert_legacy_hive_parquet_utc_timestamps=true.


http://gerrit.cloudera.org:8080/#/c/15222/6/testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test
File testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test:

http://gerrit.cloudera.org:8080/#/c/15222/6/testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test@18
PS6, Line 18: ====
            : ---- QUERY
Move the new sections to a separate .test file or rename this file to something more appropriate.
Originally this test file was meant for testing UTC timestamp functions only.


http://gerrit.cloudera.org:8080/#/c/15222/6/testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test@45
PS6, Line 45: ---- QUERY
            : SET timezone=CET;
            : select min(timestamp_col) from functional_avro.alltypestiny;
            : ---- TYPES
            : STRING
            : ---- RESULTS
            : '2009-01-01 00:00:00'
Since functional_avro.alltypestiny.timestamp_col is a string so probably you can remove this section.


http://gerrit.cloudera.org:8080/#/c/15222/6/tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
File tests/custom_cluster/test_hive_parquet_timestamp_conversion.py:

http://gerrit.cloudera.org:8080/#/c/15222/6/tests/custom_cluster/test_hive_parquet_timestamp_conversion.py@74
PS6, Line 74:     self.check_sanity(True)
            :     # Test with UTC too to check the optimizations added in IMPALA-9385.
            :     for tz_name in ["PST8PDT", "UTC"]:
            :       # The value read from the Hive table should be the same as reading a UTC converted
            :       # value from the Impala table.
            :       data = self.execute_query_expect_success(self.client, """
            :           SELECT h.id, h.day, h.timestamp_col, i.timestamp_col
            :           FROM functional_parquet.alltypesagg_hive_13_1 h
            :           JOIN functional_parquet.alltypesagg
            :             i ON i.id = h.id AND i.day = h.day  -- serves as a unique key
            :           WHERE
            :             (h.timestamp_col IS NULL AND i.timestamp_col IS NOT NULL)
            :             OR (h.timestamp_col IS NOT NULL AND i.timestamp_col IS NULL)
            :             OR h.timestamp_col != FROM_UTC_TIMESTAMP(i.timestamp_col, '%s')
            :           """ % tz_name, query_options={"timezone": tz_name})\
            :           .get_data()
            :       assert len(data) == 0
Any reason you moved this block after L73?
I think, check_sanity() should be called at the beginning of the test.



-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 18 Feb 2020 18:33:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 14: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 14
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 22 Feb 2020 02:02:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 14:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5388/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 14
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 21 Feb 2020 21:26:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Norbert Luksa, Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15222

to look at the new patch set (#12).

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................

IMPALA-9385: Unix time conversion cleanup + ORC fix

ORC scanner uses TimestampValue::FromUnixTimeNanos() to convert
sec + nano representation to Impala's TimestampValue (day + nano).
FromUnixTimeNanos was affected by flag
use_local_tz_for_unix_timestamp_conversions, while that global option
should not affect ORC. By default there was no conversion, but if the
flag is 1, then timestamps were interpreted as UTC and converted to
local time.

This could be solved by creating a UTC version of FromUnixTimeNanos,
but I decided to change the interface in the hope of making To/From
timestamp functions less confusing.

Changes:
- Fixed the bug by passing UTC as timezone in the ORC scanner.
- Changed the interface of these TimestampValue functions to expect
  a timezone pointer, interpret null as UTC and skip conversion. It
  would be also possible to pass the actual UTC timezone and check
  for this in the functions, but I guess it is easier to optimize
  the inlined functions this way.
- Moved the checking of use_local_tz_for_unix_timestamp_conversions to
  RuntimeState and added property time_zone_for_unix_time_conversions()
  to return the timezone to use in Unix time conversions. This made
  TimestampValue's interface clearer and makes it easy to replace the
  flag with a query option if we want to.
- Changed RuntimeState and the Parquet scanner to skip timezone
  conversion if convert_legacy_hive_parquet_utc_timestamps=1 but the
  timezone is UTC. This allows users to avoid the performance penalty
  of this flag by setting query option timezone to UTC in their
  session (IMPALA-7557). CCTZ is not good at this, actually
  conversions are slower with fixed offset timezones (including UTC)
  than with timezones that have DST/historical rule changes.

Postponed changes:
- Didn't remove the UTC versions of the functions yet, as that would
  require changing (and possibly rethinking) several BE tests and
  benchmarks (IMPALA-9409).

Tests:
- Added regression test for Orc and other file formats to
  check that they are not affected by this flag.
- Extended test_hive_parquet_timestamp_conversion.py to cover the case
  when convert_legacy_hive_parquet_utc_timestamps=1 and timezone=UTC.
  Also did some cleanup there to use query option timezone instead of
  env var TZ.

Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
A testdata/workloads/functional-query/queries/QueryTest/file-formats-with-local-tz-conversion.test
M testdata/workloads/functional-query/queries/QueryTest/local-timestamp-functions.test
M testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-local-tz-conversion.test
M testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/custom_cluster/test_local_tz_conversion.py
27 files changed, 288 insertions(+), 209 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/15222/12
-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 12
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5255/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 18 Feb 2020 17:10:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion
......................................................................


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5341/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 14 Feb 2020 18:11:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 5:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/5241/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 17 Feb 2020 15:55:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15222/4/be/src/exprs/timezone_db.h
File be/src/exprs/timezone_db.h:

http://gerrit.cloudera.org:8080/#/c/15222/4/be/src/exprs/timezone_db.h@77
PS4, Line 77:   // Note that in performance critical parts of the code it is recommended 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/15222/4/be/src/exprs/timezone_db.h@79
PS4, Line 79:   // TimestampValue functions have optimized path for it. 
line has trailing whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 17 Feb 2020 14:44:53 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 13: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 21 Feb 2020 14:52:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 11:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/15222/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15222/11//COMMIT_MSG@9
PS11, Line 9: Orc
> nit: ORC
Done


http://gerrit.cloudera.org:8080/#/c/15222/11/be/src/common/global-types.h
File be/src/common/global-types.h:

http://gerrit.cloudera.org:8080/#/c/15222/11/be/src/common/global-types.h@33
PS11, Line 33: UTCPTR
> Since it's not a macro anymore, maybe UtcPtr would be a better name?
I prefer it all capital for some reason, not sure why :)
We use all capital consts in Impala in several places.


http://gerrit.cloudera.org:8080/#/c/15222/11/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

http://gerrit.cloudera.org:8080/#/c/15222/11/be/src/runtime/timestamp-value.h@a35
PS11, Line 35: 
             : 
> Shouldn't we keep this comment?
There is no agreement about this - turning it on by default could make some queries much slower. I think that this comment is just confusing.



-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 20 Feb 2020 19:31:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................

IMPALA-9385: Unix time conversion cleanup + ORC fix

ORC scanner uses TimestampValue::FromUnixTimeNanos() to convert
sec + nano representation to Impala's TimestampValue (day + nano).
FromUnixTimeNanos was affected by flag
use_local_tz_for_unix_timestamp_conversions, while that global option
should not affect ORC. By default there was no conversion, but if the
flag is 1, then timestamps were interpreted as UTC and converted to
local time.

This could be solved by creating a UTC version of FromUnixTimeNanos,
but I decided to change the interface in the hope of making To/From
timestamp functions less confusing.

Changes:
- Fixed the bug by passing UTC as timezone in the ORC scanner.
- Changed the interface of these TimestampValue functions to expect
  a timezone pointer, interpret null as UTC and skip conversion. It
  would be also possible to pass the actual UTC timezone and check
  for this in the functions, but I guess it is easier to optimize
  the inlined functions this way.
- Moved the checking of use_local_tz_for_unix_timestamp_conversions to
  RuntimeState and added property time_zone_for_unix_time_conversions()
  to return the timezone to use in Unix time conversions. This made
  TimestampValue's interface clearer and makes it easy to replace the
  flag with a query option if we want to.
- Changed RuntimeState and the Parquet scanner to skip timezone
  conversion if convert_legacy_hive_parquet_utc_timestamps=1 but the
  timezone is UTC. This allows users to avoid the performance penalty
  of this flag by setting query option timezone to UTC in their
  session (IMPALA-7557). CCTZ is not good at this, actually
  conversions are slower with fixed offset timezones (including UTC)
  than with timezones that have DST/historical rule changes.

Postponed changes:
- Didn't remove the UTC versions of the functions yet, as that would
  require changing (and possibly rethinking) several BE tests and
  benchmarks (IMPALA-9409).

Tests:
- Added regression test for Orc and other file formats to
  check that they are not affected by this flag.
- Extended test_hive_parquet_timestamp_conversion.py to cover the case
  when convert_legacy_hive_parquet_utc_timestamps=1 and timezone=UTC.
  Also did some cleanup there to use query option timezone instead of
  env var TZ.

Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Reviewed-on: http://gerrit.cloudera.org:8080/15222
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
A testdata/workloads/functional-query/queries/QueryTest/file-formats-with-local-tz-conversion.test
M testdata/workloads/functional-query/queries/QueryTest/local-timestamp-functions.test
M testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-local-tz-conversion.test
M testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/custom_cluster/test_local_tz_conversion.py
27 files changed, 288 insertions(+), 209 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 15
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15222/7/be/src/exec/data-source-scan-node.cc
File be/src/exec/data-source-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/15222/7/be/src/exec/data-source-scan-node.cc@352
PS7, Line 352:         // TODO The timezone depends on flag use_local_tz_for_unix_timestamp_conversions.
             :         //      Check if this is the intended behaviour.
             :         RETURN_IF_ERROR(MaterializeNextRow(
             :             state->time_zone_for_unix_time_conversions(), tuple_pool, tuple));
> I was thinking about UTCPTR instead. Using local_time_zone() would mean tha
Yes, you're correct, it should be UTCPTR.



-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 19 Feb 2020 13:05:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Norbert Luksa, Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15222

to look at the new patch set (#8).

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................

IMPALA-9385: Unix time conversion cleanup + ORC fix

Orc scanner uses TimestampValue::FromUnixTimeNanos() to convert
sec + nano representation to Impala's TimestampValue (day + nano).
FromUnixTimeNanos was affected by flag
use_local_tz_for_unix_timestamp_conversions, while that global option
should not affect ORC. By default there was no conversion, but if the
flag is 1, then timestamps were interpreted as UTC and converted to
local time.

This could be solved by creating a UTC version of FromUnixTimeNanos,
but I decided to change the interface in the hope of making To/From
timestamp functions less confusing.

Changes:
- Fixed the bug by passing UTC as timezone in the ORC scanner.
- Changed the interface of these TimestampValue functions to expect
  a timezone pointer, interpret null as UTC and skip conversion. It
  would be also possible to pass the actual UTC timezone and check
  for this in the functions, but I guess it is easier to optimize
  the inlined functions this way.
- Moved the checking of use_local_tz_for_unix_timestamp_conversions to
  RuntimeState and added property time_zone_for_unix_time_conversions()
  to return the timezone to use in Unix time conversions. This made
  TimestampValue's interface clearer and makes it easy to replace the
  flag with a query option if we want to.
- Changed RuntimeState and the Parquet scanner to skip timezone
  conversion if convert_legacy_hive_parquet_utc_timestamps=1 but the
  timezone is UTC. This allows users to avoid the performance penalty
  of this flag by setting query option timezone to UTC in their
  session (IMPALA-7557). CCTZ is not good at this, actually
  conversions are slower with fixed offset timezones (including UTC)
  than with timezones that have DST/historical rule changes.

Postponed changes:
- Didn't remove the UTC versions of the functions yet, as that would
  require changing (and possibly rethinking) several BE tests and
  benchmarks.

Tests:
- Added regression test for Orc and other file formats to
  check that they are not affected by this flag.
- Extended test_hive_parquet_timestamp_conversion.py to cover the case
  when convert_legacy_hive_parquet_utc_timestamps=1 and timezone=UTC.
  Also did some cleanup there to use query option timezone instead of
  env var TZ.

Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M testdata/workloads/functional-query/queries/QueryTest/local-timestamp-functions.test
M testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-local-tz-conversion.test
M testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/custom_cluster/test_local_tz_conversion.py
26 files changed, 218 insertions(+), 209 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/15222/8
-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 14: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 14
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 21 Feb 2020 21:26:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion
......................................................................


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5347/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 17 Feb 2020 12:49:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15222

to look at the new patch set (#2).

Change subject: WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion
......................................................................

WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion

Orc scanner uses TimestampValue::FromUnixTimeNanos() to convert
sec + nano representation to Impala's TimestampValue (day + nano).
FromUnixTimeNanos is affected by flag
use_local_tz_for_unix_timestamp_conversions, while that global option
should not affect Orc. By default there is no conversion, but if the
flag is 1, then timestamps are interpreted as UTC and converted to
local time.

This could be solved by creating a UTC version of FromUnixTimeNanos,
but I decided to change the interface in the hope of making To/From
timestamp functions less confusing.

Changes so far:
- Fixed the bug by passing UTC as timezone in the Orc scanner.
- Changed the interface of these TimestampValue functions to expect
  a timezone pointer, interpret null as UTC and skip conversion. It
  would be also possible to pass the actual UTC timezone and check
  for this in the functions, but I guess that it is easier to optimize
  the inlined functions this way.
- Changed RuntimeState and the Parquet scanner to skip timezone
  conversion if convert_legacy_hive_parquet_utc_timestamps=1 but the
  timezone is UTC. This allows users to avoid the performance penalty
  of this flag by setting query option timezone to UTC in their
  session (IMPALA-7557). CCTZ is not good at this, actually
  conversions are slower with fixed offset timezones (including UTC)
  than with timezones that have DST/historical rule changes.

Further planned changes:
- Remove the UTC versions of the functions, as they shouldn't be
  faster than calling the converting functions with null after
  inlining.
- Move out the checking of use_local_tz_for_unix_timestamp_conversions
  to RuntimeState. The plan is to create a function like
  Timezone* local_timezone_for_unix_time_conversions() and return
  null or the local timezone based on the flag. I think that this
  would make the intention of the callsites of From/To functions
  much clearer.
- Add/update comments.

Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
18 files changed, 111 insertions(+), 113 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/15222/2
-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 3: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 17 Feb 2020 17:10:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Norbert Luksa, Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15222

to look at the new patch set (#7).

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................

IMPALA-9385: Unix time conversion cleanup + ORC fix

Orc scanner uses TimestampValue::FromUnixTimeNanos() to convert
sec + nano representation to Impala's TimestampValue (day + nano).
FromUnixTimeNanos was affected by flag
use_local_tz_for_unix_timestamp_conversions, while that global option
should not affect ORC. By default there is was conversion, but if the
flag is 1, then timestamps were interpreted as UTC and converted to
local time.

This could be solved by creating a UTC version of FromUnixTimeNanos,
but I decided to change the interface in the hope of making To/From
timestamp functions less confusing.

Changes:
- Fixed the bug by passing UTC as timezone in the ORC scanner.
- Changed the interface of these TimestampValue functions to expect
  a timezone pointer, interpret null as UTC and skip conversion. It
  would be also possible to pass the actual UTC timezone and check
  for this in the functions, but I guess that it is easier to optimize
  the inlined functions this way.
- Moved the checking of use_local_tz_for_unix_timestamp_conversions to
  RuntimeState and added property time_zone_for_unix_time_conversions()
  to return the timezone to use in Unix time conversions. This made
  TimestampValue's interface clearer and makes it easy to replace the
  flag with a query option if we want to.
- Changed RuntimeState and the Parquet scanner to skip timezone
  conversion if convert_legacy_hive_parquet_utc_timestamps=1 but the
  timezone is UTC. This allows users to avoid the performance penalty
  of this flag by setting query option timezone to UTC in their
  session (IMPALA-7557). CCTZ is not good at this, actually
  conversions are slower with fixed offset timezones (including UTC)
  than with timezones that have DST/historical rule changes.

Postponed changes:
- Didn't remove the UTC versions of the functions yet, as that would
  require changing (and possibly rethinking) several BE tests and
  benchmarks.

Tests:
- Added regression test for Orc and other file formats to
  check that they are not affected by this flag.
- Extended test_hive_parquet_timestamp_conversion.py to cover the case
  when convert_legacy_hive_parquet_utc_timestamps=1 and timezone=UTC.
  Also did some cleanup there to use query option timezone instead of
  env var TZ.

Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-local-tz-conversion.test
M testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
24 files changed, 280 insertions(+), 209 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/15222/7
-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 7:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/5254/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 18 Feb 2020 14:45:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 11:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5286/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 20 Feb 2020 13:10:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 13:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5386/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 21 Feb 2020 14:52:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Norbert Luksa, Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15222

to look at the new patch set (#9).

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................

IMPALA-9385: Unix time conversion cleanup + ORC fix

Orc scanner uses TimestampValue::FromUnixTimeNanos() to convert
sec + nano representation to Impala's TimestampValue (day + nano).
FromUnixTimeNanos was affected by flag
use_local_tz_for_unix_timestamp_conversions, while that global option
should not affect ORC. By default there was no conversion, but if the
flag is 1, then timestamps were interpreted as UTC and converted to
local time.

This could be solved by creating a UTC version of FromUnixTimeNanos,
but I decided to change the interface in the hope of making To/From
timestamp functions less confusing.

Changes:
- Fixed the bug by passing UTC as timezone in the ORC scanner.
- Changed the interface of these TimestampValue functions to expect
  a timezone pointer, interpret null as UTC and skip conversion. It
  would be also possible to pass the actual UTC timezone and check
  for this in the functions, but I guess it is easier to optimize
  the inlined functions this way.
- Moved the checking of use_local_tz_for_unix_timestamp_conversions to
  RuntimeState and added property time_zone_for_unix_time_conversions()
  to return the timezone to use in Unix time conversions. This made
  TimestampValue's interface clearer and makes it easy to replace the
  flag with a query option if we want to.
- Changed RuntimeState and the Parquet scanner to skip timezone
  conversion if convert_legacy_hive_parquet_utc_timestamps=1 but the
  timezone is UTC. This allows users to avoid the performance penalty
  of this flag by setting query option timezone to UTC in their
  session (IMPALA-7557). CCTZ is not good at this, actually
  conversions are slower with fixed offset timezones (including UTC)
  than with timezones that have DST/historical rule changes.

Postponed changes:
- Didn't remove the UTC versions of the functions yet, as that would
  require changing (and possibly rethinking) several BE tests and
  benchmarks.

Tests:
- Added regression test for Orc and other file formats to
  check that they are not affected by this flag.
- Extended test_hive_parquet_timestamp_conversion.py to cover the case
  when convert_legacy_hive_parquet_utc_timestamps=1 and timezone=UTC.
  Also did some cleanup there to use query option timezone instead of
  env var TZ.

Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M testdata/workloads/functional-query/queries/QueryTest/local-timestamp-functions.test
M testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-local-tz-conversion.test
M testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/custom_cluster/test_local_tz_conversion.py
26 files changed, 220 insertions(+), 209 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/15222/9
-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 6:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/5251/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 18 Feb 2020 14:18:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion
......................................................................


Patch Set 1:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/5225/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 14 Feb 2020 18:43:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
......................................................................


Patch Set 13:

The failed test seems unrelated: https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/1828/testReport/junit/query_test.test_result_spooling/TestResultSpoolingFetchSize/test_fetch_fetch_size__4321___protocol__beeswax___exec_option____batch_size___1024___num_nodes___0___disable_codegen_rows_threshold___0___disable_codegen___False___abort_on_error___1___exec_single_node_rows_threshold___0____table_format__parquet_none___wait_for_finished__True_/


-- 
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 21 Feb 2020 21:25:41 +0000
Gerrit-HasComments: No