You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Attila Jeges (Code Review)" <ge...@cloudera.org> on 2018/04/11 12:25:21 UTC

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Attila Jeges has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9986


Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................

IMPALA-3307: Add support for IANA time-zone db

Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.
- Introduces a new statup flag (--hdfs_zoneinfo_dir) to impalad to
  specify an HDFS/S3/ADLS location that contains the shared compiled
  IANA time-zone database. If the startup flag is set, impalad will
  use the specified time-zone database. Otherwise, impalad will use
  the default /usr/share/zoneinfo time-zone database.
- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.
- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.
- Introduces a new startup flag (--hdfs_zoneabbrev_config) to impalad
  to specify an HDFS/S3/ADLS path to a shared config file that
  contains definitions for non-standard time-zone abbreviations.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/util/hdfs-util-test.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M be/src/util/time-test.cc
M be/src/util/time.cc
M be/src/util/time.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindCctz.cmake
M common/thrift/ImpalaInternalService.thrift
M common/thrift/metrics.json
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/data/timezoneverification.csv
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
40 files changed, 1,919 insertions(+), 1,045 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/1
-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges <at...@cloudera.com>

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 10:

> > > > Uploaded patch set 9.
 > > >
 > > > Patch -set 9 contains the following changes:
 > > > - Added a full timezone db to testdata/tzdb.
 > > > - End-to-end tests and BE-tests were changed to use this
 > timezone
 > > > db. This was necessary because some timezone-tests were failing
 > > on
 > > > older jenkins workers that had an older tzdata package
 > installed.
 > >
 > > It might be a good idea to store the timezone-db files in one
 > .tar
 > > file and extract them before running the tests. What do you
 > think?
 > 
 > I agree, .taring or compressing the tz db would be much better, if
 > it does not make the code too complicated. Having less file would
 > make the review more readable, and would also make the tz db
 > consume much less space on hdfs, as the many small files will be
 > rounded up to hdfs block size.

Extracting files from a .tar file can be tricky. Probably we would have to add libtar library to the native-toolchain to handle .tar files.

Alternatively we can store timezone files in a JAR archive instead. The BE can call into the java FE to extract files from it.


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 30 May 2018 15:25:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, Zoltan Borok-Nagy, Csaba Ringhofer, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9986

to look at the new patch set (#5).

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................

IMPALA-3307: Add support for IANA time-zone db

Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.
- Introduces a new startup flag (--hdfs_zone_info_dir) to impalad to
  specify an HDFS/S3/ADLS location that contains the shared compiled
  IANA time-zone database. If the startup flag is set, impalad will
  use the specified time-zone database. Otherwise, impalad will use
  the default /usr/share/zoneinfo time-zone database.
- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.
- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.
- Introduces a new startup flag (--hdfs_zone_abbrev_conf) to impalad
  to specify an HDFS/S3/ADLS path to a shared config file that
  contains definitions for non-standard time-zone abbreviations.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
A be/src/exprs/timezone_db-test.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/hdfs-util-test.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M be/src/util/time-test.cc
M be/src/util/time.cc
M be/src/util/time.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindCctz.cmake
M common/thrift/ImpalaInternalService.thrift
M common/thrift/metrics.json
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/bin/create-load-data.sh
M testdata/data/timezoneverification.csv
A testdata/tzdb/abbrev.conf
A testdata/tzdb/zoneinfo/AmerICA/ArgeNTINA/MendOZA
A testdata/tzdb/zoneinfo/AmerICA/CancUN
A testdata/tzdb/zoneinfo/UTC
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
A tests/custom_cluster/test_custom_tzdb.py
53 files changed, 2,556 insertions(+), 1,096 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/5
-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 5
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 9:

> > Uploaded patch set 9.
 > 
 > Patch -set 9 contains the following changes:
 > - Added a full timezone db to testdata/tzdb.
 > - End-to-end tests and BE-tests were changed to use this timezone
 > db. This was necessary because some timezone-tests were failing on
 > older jenkins workers that had an older tzdata package installed.

It might be a good idea to store the timezone-db files in one .tar file and extract them before running the tests. What do you think?


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 9
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 25 May 2018 16:23:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 7: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 7
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 15 May 2018 17:35:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 22: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 22
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Jun 2018 09:50:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................

IMPALA-3307: Add support for IANA time-zone db

Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.
- Introduces a new startup flag (--hdfs_zone_info_zip) to impalad to
  specify an HDFS/S3/ADLS path to a zip archive that contains the
  shared compiled IANA time-zone database. If the startup flag is set,
  impalad will use the specified time-zone database. Otherwise,
  impalad will use the default /usr/share/zoneinfo time-zone database.
- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.
- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.
- Introduces a new startup flag (--hdfs_zone_abbrev_conf) to impalad
  to specify an HDFS/S3/ADLS path to a shared config file that
  contains definitions for non-standard time-zone abbreviations.

Cherry-picks: not for 2.x.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/common/init.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
A be/src/exprs/timezone_db-test.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/frontend.cc
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/util/CMakeLists.txt
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/hdfs-util-test.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M be/src/util/time-test.cc
M be/src/util/time.cc
M be/src/util/time.h
A be/src/util/zip-util.cc
A be/src/util/zip-util.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/rat_exclude_files.txt
A cmake_modules/FindCctz.cmake
M common/thrift/CMakeLists.txt
M common/thrift/ImpalaInternalService.thrift
A common/thrift/Zip.thrift
M common/thrift/metrics.json
A fe/src/main/java/org/apache/impala/util/ZipUtil.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/bin/create-load-data.sh
M testdata/data/timezoneverification.csv
A testdata/tzdb/2017c.zip
A testdata/tzdb/abbrev.conf
A testdata/tzdb_tiny/America/New_York
A testdata/tzdb_tiny/Etc/GMT+4
A testdata/tzdb_tiny/US/Eastern
A testdata/tzdb_tiny/UTC
A testdata/tzdb_tiny/Zulu
A testdata/tzdb_tiny/posix/UTC
A testdata/tzdb_tiny/posixrules
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
A tests/custom_cluster/test_shared_tzdb.py
D tests/query_test/test_timezones.py
70 files changed, 2,995 insertions(+), 1,168 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/11
-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 11
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 10:

I do not see any more low hanging fruits for performance improvement. Some overhead could be removed by modifying CCTZ, but this is out of the scope of this change, so I created a follow up Jira: 
IMPALA-7085:
Consider patching Google/CCTZ for Impala's need


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 28 May 2018 16:20:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 3:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/exprs/timestamp-functions.cc@95
PS3, Line 95:   time_t unix_time;
            :   if (UNLIKELY(!ts_value.UtcToUnixTime(&unix_time))) return TimestampVal::null();
            :   cctz::time_point<cctz::sys_seconds> from_tp = UnixSecondsToTimePoint(unix_time);
            : 
            :   // Convert 'from_tp' time_point to civil_second assuming 'timezone' time-zone.
            :   cctz::civil_second to_cs = cctz::convert(from_tp, *timezone);
            : 
            :   if (UNLIKELY(CheckIfDateOutOfRange(cctz::civil_day(to_cs)))) {
            :     const string& msg = Substitute(
            :         "Timestamp '$0' did not convert to a valid local time in timezone '$1'",
            :         ts_value.ToString(), tz_string_value.DebugString());
            :     context->AddWarning(msg.c_str());
            :     return TimestampVal::null();
            :   }
            : 
            :   // Note that 'to_cs' has second granularity. Since time-zone rules do not affect
            :   // fractional seconds, the fractional second part of the returned TimestampVal should be
            :   // equal to ts_value.time().fractional_seconds().
            :   return CivilSecondToTimestampVal(to_cs, ts_value.time().fractional_seconds());
This logic is the same as TimestampValue::UtcToLocal(), plus warning if the resulting timestamp is not valid. As local time and generic timezone conversions are the same now, it would make sense to keep them at one place.


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/exprs/timestamp-functions.cc@144
PS3, Line 144:   cctz::time_point<cctz::sys_seconds> from_tp = from_cl.pre;
             : 
             :   // Convert 'from_tp' time_point to civil_second assuming 'UTC' time-zone.
             :   cctz::civil_second to_cs = cctz::convert(from_tp, TimezoneDatabase::GetUtcTimezone());
             : 
             :   if (UNLIKELY(CheckIfDateOutOfRange(cctz::civil_day(to_cs)))) {
             :     const string& msg =
             :         Substitute("Timestamp '$0' in timezone '$1' could not be converted to UTC",
             :             ts_value.ToString(), tz_string_value.DebugString());
             :     context->AddWarning(msg.c_str());
             :     return TimestampVal::null();
             :   }
             : 
             :   // Note that 'to_cs' has second granularity. Since time-zone rules do not affect
             :   // fractional seconds, the fractional second part of the returned TimestampVal should be
             :   // equal to ts_value.time().fractional_seconds().
             :   return CivilSecondToTimestampVal(to_cs, ts_value.time().fractional_seconds());
Similarly to my comment at line 113, this logic could be moved to a TimestampValue function. Removing these calculations from this file would mean that the helper functions (line 47-74) could be removed too.


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/runtime/timestamp-value.cc@117
PS3, Line 117:       time_ = boost::posix_time::time_duration(to_cs.hour(), to_cs.minute(), to_cs.second(),
nit: long line


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h
File be/src/util/filesystem-util.h:

http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h@66
PS3, Line 66: iff
typo: if


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h@70
PS3, Line 70: iff
typo: if


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h@71
PS3, Line 71: path
Maybe writing "string" instead of "path" express better that no file system is accessed.


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h@79
PS3, Line 79: path
Same as line 71.


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/time.cc
File be/src/util/time.cc:

http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/time.cc@165
PS3, Line 165:  
nit: extra space


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/time.cc@168
PS3, Line 168:  
nit: extra space


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/time.cc@171
PS3, Line 171:  
nit: extra space



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 04 May 2018 12:19:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 20:

clang-tidy job failed. Investigating.


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 20
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Jun 2018 16:35:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 9:

> > > Uploaded patch set 9.
 > >
 > > Patch -set 9 contains the following changes:
 > > - Added a full timezone db to testdata/tzdb.
 > > - End-to-end tests and BE-tests were changed to use this timezone
 > > db. This was necessary because some timezone-tests were failing
 > on
 > > older jenkins workers that had an older tzdata package installed.
 > 
 > It might be a good idea to store the timezone-db files in one .tar
 > file and extract them before running the tests. What do you think?

I agree, .taring or compressing the tz db would be much better, if it does not make the code too complicated. Having less file would make the review more readable, and would also make the tz db consume much less space on hdfs, as the many small files will be rounded up to hdfs block size.


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 9
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 25 May 2018 16:33:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 20:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2721/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 20
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Jun 2018 12:35:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 22:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2725/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 22
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Jun 2018 09:50:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 10:

(1 comment)

Yeah I agree with Phil that doing JNI to java to unzip a file may be the least of all evils - we wouldn't be adding another dependency and there's no risk of DOSing the NameNode. I doubt it would take the NameNode down, but it could disrupt other things happening on the cluster unnecessarily.

http://gerrit.cloudera.org:8080/#/c/9986/10/testdata/tzdb/2017c/Africa/Abidjan
File testdata/tzdb/2017c/Africa/Abidjan:

http://gerrit.cloudera.org:8080/#/c/9986/10/testdata/tzdb/2017c/Africa/Abidjan@1
PS10, Line 1: ../Atlantic/St_Helena
> We're adding a ton of files. Do we need such a big database for our testing
We'll also need to exclude these files from RAT checks (and document how they are licensed).



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 31 May 2018 00:30:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has uploaded a new patch set (#21). ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................

IMPALA-3307: Add support for IANA time-zone db

Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.

- Introduces a new startup flag (--hdfs_zone_info_zip) to impalad to
  specify an HDFS/S3/ADLS path to a zip archive that contains the
  shared compiled IANA time-zone database. If the startup flag is set,
  impalad will use the specified time-zone database. Otherwise,
  impalad will use the default /usr/share/zoneinfo time-zone database.

- Introduces a new startup flag (--hdfs_zone_alias_conf) to impalad to
  specify an HDFS/S3/ADLS path to a shared config file that contains
  definitions for non-standard time-zone aliases.

- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.

- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.

- Adds a new ZipUtil class to extract files from a zip archive. The
  implementation is not vulnerable to Zip Slip.

Cherry-picks: not for 2.x.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/common/init.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
A be/src/exprs/timezone_db-test.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/frontend.cc
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/util/CMakeLists.txt
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/hdfs-util-test.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M be/src/util/time-test.cc
M be/src/util/time.cc
M be/src/util/time.h
A be/src/util/zip-util-test.cc
A be/src/util/zip-util.cc
A be/src/util/zip-util.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/rat_exclude_files.txt
A cmake_modules/FindCctz.cmake
M common/thrift/CMakeLists.txt
M common/thrift/ImpalaInternalService.thrift
A common/thrift/Zip.thrift
M common/thrift/metrics.json
A fe/src/main/java/org/apache/impala/util/ZipUtil.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/bin/create-load-data.sh
M testdata/data/timezoneverification.csv
A testdata/tzdb/2017c-corrupt.zip
A testdata/tzdb/2017c.zip
A testdata/tzdb/alias.conf
A testdata/tzdb_tiny/America/New_York
A testdata/tzdb_tiny/Etc/GMT+4
A testdata/tzdb_tiny/US/Eastern
A testdata/tzdb_tiny/UTC
A testdata/tzdb_tiny/Zulu
A testdata/tzdb_tiny/posix/UTC
A testdata/tzdb_tiny/posixrules
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
A tests/custom_cluster/test_shared_tzdb.py
D tests/query_test/test_timezones.py
72 files changed, 3,086 insertions(+), 1,176 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/21
-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 21
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................

IMPALA-3307: Add support for IANA time-zone db

Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.
- Introduces a new startup flag (--hdfs_zone_info_dir) to impalad to
  specify an HDFS/S3/ADLS location that contains the shared compiled
  IANA time-zone database. If the startup flag is set, impalad will
  use the specified time-zone database. Otherwise, impalad will use
  the default /usr/share/zoneinfo time-zone database.
- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.
- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.
- Introduces a new startup flag (--hdfs_zone_abbrev_conf) to impalad
  to specify an HDFS/S3/ADLS path to a shared config file that
  contains definitions for non-standard time-zone abbreviations.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
A be/src/exprs/timezone_db-test.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/hdfs-util-test.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M be/src/util/time-test.cc
M be/src/util/time.cc
M be/src/util/time.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindCctz.cmake
M common/thrift/ImpalaInternalService.thrift
M common/thrift/metrics.json
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/bin/create-load-data.sh
M testdata/data/timezoneverification.csv
A testdata/tzdb/abbrev.conf
A testdata/tzdb/zoneinfo/AmerICA/ArgeNTINA/MendOZA
A testdata/tzdb/zoneinfo/AmerICA/CancUN
A testdata/tzdb/zoneinfo/UTC
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
A tests/custom_cluster/test_custom_tzdb.py
53 files changed, 2,575 insertions(+), 1,092 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/7
-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 7
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, Zoltan Borok-Nagy, Csaba Ringhofer, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9986

to look at the new patch set (#3).

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................

IMPALA-3307: Add support for IANA time-zone db

Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.
- Introduces a new startup flag (--hdfs_zone_info_dir) to impalad to
  specify an HDFS/S3/ADLS location that contains the shared compiled
  IANA time-zone database. If the startup flag is set, impalad will
  use the specified time-zone database. Otherwise, impalad will use
  the default /usr/share/zoneinfo time-zone database.
- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.
- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.
- Introduces a new startup flag (--hdfs_zone_abbrev_conf) to impalad
  to specify an HDFS/S3/ADLS path to a shared config file that
  contains definitions for non-standard time-zone abbreviations.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
A be/src/exprs/timezone_db-test.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/hdfs-util-test.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M be/src/util/time-test.cc
M be/src/util/time.cc
M be/src/util/time.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindCctz.cmake
M common/thrift/ImpalaInternalService.thrift
M common/thrift/metrics.json
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/bin/create-load-data.sh
M testdata/data/timezoneverification.csv
A testdata/tzdb/abbrev.conf
A testdata/tzdb/zoneinfo/AmerICA/ArgeNTINA/MendOZA
A testdata/tzdb/zoneinfo/AmerICA/CancUN
A testdata/tzdb/zoneinfo/UTC
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
A tests/custom_cluster/custom_tzdb.py
53 files changed, 2,542 insertions(+), 1,093 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/3
-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 19:

Are we ready to go ahead and merge? Would be good to run exhaustive tests + ASAN before merging just to be sure we aren't going to break anything.


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 19
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 18 Jun 2018 22:35:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................

IMPALA-3307: Add support for IANA time-zone db

Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.

- Introduces a new startup flag (--hdfs_zone_info_zip) to impalad to
  specify an HDFS/S3/ADLS path to a zip archive that contains the
  shared compiled IANA time-zone database. If the startup flag is set,
  impalad will use the specified time-zone database. Otherwise,
  impalad will use the default /usr/share/zoneinfo time-zone database.

- Introduces a new startup flag (--hdfs_zone_alias_conf) to impalad to
  specify an HDFS/S3/ADLS path to a shared config file that contains
  definitions for non-standard time-zone aliases.

- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.

- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.

- Adds a new ZipUtil class to extract files from a zip archive. The
  implementation is not vulnerable to Zip Slip.

Cherry-picks: not for 2.x.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/common/init.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
A be/src/exprs/timezone_db-test.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/frontend.cc
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/util/CMakeLists.txt
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/hdfs-util-test.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M be/src/util/time-test.cc
M be/src/util/time.cc
M be/src/util/time.h
A be/src/util/zip-util-test.cc
A be/src/util/zip-util.cc
A be/src/util/zip-util.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/rat_exclude_files.txt
A cmake_modules/FindCctz.cmake
M common/thrift/CMakeLists.txt
M common/thrift/ImpalaInternalService.thrift
A common/thrift/Zip.thrift
M common/thrift/metrics.json
A fe/src/main/java/org/apache/impala/util/ZipUtil.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/bin/create-load-data.sh
M testdata/data/timezoneverification.csv
A testdata/tzdb/2017c-corrupt.zip
A testdata/tzdb/2017c.zip
A testdata/tzdb/alias.conf
A testdata/tzdb_tiny/America/New_York
A testdata/tzdb_tiny/Etc/GMT+4
A testdata/tzdb_tiny/US/Eastern
A testdata/tzdb_tiny/UTC
A testdata/tzdb_tiny/Zulu
A testdata/tzdb_tiny/posix/UTC
A testdata/tzdb_tiny/posixrules
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
A tests/custom_cluster/test_shared_tzdb.py
D tests/query_test/test_timezones.py
72 files changed, 3,089 insertions(+), 1,167 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/17
-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 17
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 1:

(2 comments)

Just skimmed through. Will do several further passes

http://gerrit.cloudera.org:8080/#/c/9986/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9986/1//COMMIT_MSG@34
PS1, Line 34: --hdfs_zoneinfo_dir
Why call it hdfs_zone_dir when it can also refer to S3 and ADLS?
Maybe it could be just --zoneinfo_dir and the given URI would specify the storage.


http://gerrit.cloudera.org:8080/#/c/9986/1//COMMIT_MSG@45
PS1, Line 45: --hdfs_zoneabbrev_config
Same as above.



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 11 Apr 2018 16:23:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9986/10/testdata/tzdb/2017c/Africa/Abidjan
File testdata/tzdb/2017c/Africa/Abidjan:

http://gerrit.cloudera.org:8080/#/c/9986/10/testdata/tzdb/2017c/Africa/Abidjan@1
PS10, Line 1: ../Atlantic/St_Helena
> I've updated rat_exclude_files.txt in patch-set #11 and #13. I ran the rat 
Hmm, not sure how I missed that - sorry!



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 14 Jun 2018 16:43:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 21:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2723/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 21
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Jun 2018 16:46:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................

IMPALA-3307: Add support for IANA time-zone db

Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.
- Introduces a new startup flag (--hdfs_zone_info_dir) to impalad to
  specify an HDFS/S3/ADLS location that contains the shared compiled
  IANA time-zone database. If the startup flag is set, impalad will
  use the specified time-zone database. Otherwise, impalad will use
  the default /usr/share/zoneinfo time-zone database.
- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.
- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.
- Introduces a new startup flag (--hdfs_zone_abbrev_conf) to impalad
  to specify an HDFS/S3/ADLS path to a shared config file that
  contains definitions for non-standard time-zone abbreviations.

Cherry-picks: not for 2.x.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
A be/src/exprs/timezone_db-test.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/hdfs-util-test.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M be/src/util/time-test.cc
M be/src/util/time.cc
M be/src/util/time.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindCctz.cmake
M common/thrift/ImpalaInternalService.thrift
M common/thrift/metrics.json
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/bin/create-load-data.sh
M testdata/data/timezoneverification.csv
A testdata/tzdb/abbrev.conf
A testdata/tzdb/zoneinfo/AmerICA/ArgeNTINA/MendOZA
A testdata/tzdb/zoneinfo/AmerICA/CancUN
A testdata/tzdb/zoneinfo/UTC
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
A tests/custom_cluster/test_custom_tzdb.py
53 files changed, 2,603 insertions(+), 1,095 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/8
-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 8
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 14: Code-Review+2

(5 comments)

Thanks for thinking about Zip-Slip!
I have left a few optional comments about the usability of the interfaces.

http://gerrit.cloudera.org:8080/#/c/9986/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9986/14//COMMIT_MSG@34
PS14, Line 34: - Introduces a new startup flag (--hdfs_zone_info_zip) to impalad to
The Zip slip safe zip-util could be also mentioned in the commit message.


http://gerrit.cloudera.org:8080/#/c/9986/14/be/src/util/filesystem-util.h
File be/src/util/filesystem-util.h:

http://gerrit.cloudera.org:8080/#/c/9986/14/be/src/util/filesystem-util.h@92
PS14, Line 92:     Directory(const string& path, bool skip_hidden_entries = true);
I thought a bit about usability and I vote for removing this parameter and skip only "." and ".." - I can't imagine any use case when I would be interested in those.


http://gerrit.cloudera.org:8080/#/c/9986/14/be/src/util/filesystem-util.h@109
PS14, Line 109:     static Status GetEntryNames(const string& path,
I would prefer max_result_size to be the last parameter, and give it a default value of 0.


http://gerrit.cloudera.org:8080/#/c/9986/14/be/src/util/zip-util-test.cc
File be/src/util/zip-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/9986/14/be/src/util/zip-util-test.cc@69
PS14, Line 69:   EXPECT_FALSE(filesystem::exists(dest_dir3));
I guess that this is only true if zip decoding failed at the start, and some files may be already decompressed before reaching an error in the zip. I am not sure what to do with this, probably nothing. It would be possible add some kind of cleanup logic to the java util, but I am not sure if this worth the effort.


http://gerrit.cloudera.org:8080/#/c/9986/14/fe/src/main/java/org/apache/impala/util/ZipUtil.java
File fe/src/main/java/org/apache/impala/util/ZipUtil.java:

http://gerrit.cloudera.org:8080/#/c/9986/14/fe/src/main/java/org/apache/impala/util/ZipUtil.java@45
PS14, Line 45:     try (ZipFile zip = new ZipFile(params.archive_file)) {
I would move this block to a similar function with (String archiveFile, String destDir) parameters to make this util usable from Java too. This would be minimal extra effort and I think that it can be handy to have an easily usable Zip-Slip safe extract function.



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 14
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 08 Jun 2018 16:15:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 1:

(31 comments)

http://gerrit.cloudera.org:8080/#/c/9986/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9986/1//COMMIT_MSG@34
PS1, Line 34: statup
> typo: startup
Done


http://gerrit.cloudera.org:8080/#/c/9986/1//COMMIT_MSG@34
PS1, Line 34: --hdfs_zoneinfo_dir
> Why call it hdfs_zone_dir when it can also refer to S3 and ADLS?
The name reflects that this flag should be used to specify a location in a "shared" filesystem that can be accessed through the HDFS API.

I think, naming the flag "--zoneinfo_dir" might be misleading as it should not be used to specify a directory in a local filesystem.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc
File be/src/benchmarks/convert-timestamp-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@1
PS1, Line 1: #include <chrono>
> Missing Apache header
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@38
PS1, Line 38: UtcToUnixTime:             Function  iters/ms   10%ile   50%ile   90%ile     10%ile     50%ile     90%ile
> Do you think we should force the 90col limit on the following comment as we
Probably it's better to leave it like this. Some of the other benchmark programs have longer lines as well (e.g. ./be/src/benchmarks/int-hash-benchmark.cc)


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@134
PS1, Line 134: val
> I don't know what RAND_MAX is here, but I think that it can be 32K, which w
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@136
PS1, Line 136:     ss << to_simple_string(start);
> to_simple_string() return a string already, no need to use a stringstream f
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@166
PS1, Line 166:  d
> I am a bit concerned about writing to the same buffer from every thread - m
Fixed it. It required some major refactoring too.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@709
PS1, Line 709:     m1 = measure_multithreaded_elapsed_time(glibc_test_utc_to_unix, num_of_threads,BATCH_SIZE,
> nit: long line
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@711
PS1, Line 711:     m2 = measure_multithreaded_elapsed_time(cctz_test_utc_to_unix, num_of_threads, BATCH_SIZE,
> nit: long line
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@725
PS1, Line 725:     m1 = measure_multithreaded_elapsed_time(boost_test_from_utc, num_of_threads, BATCH_SIZE,
> nit: long line
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@727
PS1, Line 727:     m2 = measure_multithreaded_elapsed_time(cctz_test_from_utc, num_of_threads, BATCH_SIZE,
> nit: long line
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@743
PS1, Line 743:     m2 = measure_multithreaded_elapsed_time(cctz_test_utc_to_local, num_of_threads, BATCH_SIZE,
> nit: long line
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/decimal-operators.h
File be/src/exprs/decimal-operators.h:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/decimal-operators.h@168
PS1, Line 168:   /// instead of truncating if 'round' is true.
> Should we mention the new parameter in the comment?
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions-ir.cc@523
PS1, Line 523: /
> nit: missing spaces
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@90
PS1, Line 90: t.fractional_seconds()
> You have explained in person that 't' is used instead of 'to_cs'for sub-sec
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@105
PS1, Line 105:   if (timezone == nullptr) {
> This could be UNLIKELY.
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@142
PS1, Line 142: t.fractional_seconds()
> Same as in line 90.
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h
File be/src/exprs/timezone_db.h:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@28
PS1, Line 28: /// Functions to load and access the time-zone database.
> Please add some comments about thread-safety (e.g. "Initialize() should be 
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@160
PS1, Line 160: bool IsSymbolicLink(const string& path, string* real_path) {
> Maybe this could be moved to class FileSystemUtil.
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@412
PS1, Line 412:   char buffer[64*1024];
> I am a bit concerned about this - is it ok to keep buffers of this size on 
Switched to using vector<char> instead.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@438
PS1, Line 438: // Load custom time-zone abbreviations from 'is' and add them to 'tz_name_map_'.
> In most of Impala, the comments for private functions are in the .h file an
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@439
PS1, Line 439: void TimezoneDatabase::LoadZoneAbbreviations(istream &is,
> Are you sure that a corrupt abbreviation file is not an error? Maybe duplic
Agree, done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@440
PS1, Line 440: const char *path /* = nullptr */
> I did not find any caller that fills this argument. Please check if it can 
'path' should be used when calling LoadZoneAbbreviations() in L434. Fixed it.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@460
PS1, Line 460: Skippng
> typo: Skippng is used consistently instead of Skipping - is this intentiona
It was a copy/paste typo.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@484
PS1, Line 484:       if (tz_name_map_.find(abbrev) != tz_name_map_.end()) {
> This could be checked before processing value and merged with the abbreviat
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/runtime-state.h@321
PS1, Line 321: -
> nit: typo
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/runtime-state.h@321
PS1, Line 321: global local
> I see the intent but "global local timezone" sounds strange :)
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/timestamp-value.cc@172
PS1, Line 172:   auto from_tp = FromUnixSeconds(unix_time);
             :   auto to_cs = cctz::convert(from_tp, TimezoneDatabase::GetUtcTimezone());
             :   // boost::gregorian::date() throws boost::gregorian::bad_year if year is not in the
             :   // 1400..9999 range. Need to check validity before creating the date object.
             :   if (UNLIKELY(CheckIfDateOutOfRange(cctz::civil_day(to_cs)))) {
             :     return ptime(not_a_date_time);
             :   } else {
             :     return ptime(
             :         boost::gregorian::date(to_cs.year(), to_cs.month(), to_cs.day()),
             :         boost::posix_time::time_duration(to_cs.hour(), to_cs.minute(), to_cs.second()));
> This could be replaced by calling TimestampValue::UnixTimeToLocalPtime(unix
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/timestamp-value.inline.h
File be/src/runtime/timestamp-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/timestamp-value.inline.h@55
PS1, Line 55: inline bool TimestampValue::UtcToUnixTime(time_t* unix_time) const {
> Two opposing ideas:
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/util/time-test.cc
File be/src/util/time-test.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/util/time-test.cc@60
PS1, Line 60:   EXPECT_EQ("1677-09-21 00:12:43.146",
            :       ToUtcStringFromUnixMillis(INT64_MIN / NANOS_PER_MICRO / MICROS_PER_MILLI));
            :   EXPECT_EQ("1677-09-21 00:12:43.145225",
            :       ToUtcStringFromUnixMicros(INT64_MIN / NANOS_PER_MICRO));
> Why have these times changed? Are these results "more correct" than the old
Actually, the old expected values were incorrect due to a bug in time.cc. Explanation:

1. In L58, INT64_MIN/NANOS_PER_SEC == -9223372036.
ToUtcStringFromUnix(-9223372036) == "1677-09-21 00:12:44"

2. In L60, INT64_MIN/NANOS_PER_MICRO/MICROS_PER_MILLI == -9223372036854.
-9223372036854 millisec == -9223372036*1000 millisec - 854 millisec.

Therefore ToUtcStringFromUnixMillis(-9223372036854) must correspond to a timestamp that is 854 millisec before "1677-09-21 00:12:44", that is "1677-09-21 00:12:43.146".

3. In L62, INT64_MIN/NANOS_PER_MICRO == -9223372036854775.
-9223372036854775 microsec == -9223372036*1000000 microsec - 854775 microsec.

Therefore ToUtcStringFromUnixMicros(-9223372036854775) must convert to a timestamp that is 854775 microsec before "1677-09-21 00:12:44", that is "1677-09-21 00:12:43.145225".


http://gerrit.cloudera.org:8080/#/c/9986/1/fe/src/test/java/org/apache/impala/testutil/TestUtils.java
File fe/src/test/java/org/apache/impala/testutil/TestUtils.java:

http://gerrit.cloudera.org:8080/#/c/9986/1/fe/src/test/java/org/apache/impala/testutil/TestUtils.java@267
PS1, Line 267:     queryCtx.setLocal_time_zone("PST8PDT");
> Can you explain to what was changed here? Some tests ran differently depend
This commit adds 'local_time_zone' field to the query context (ImpalaInternalService.thrift). The BE expects this field to be set to a non-null value for each query.

We set the value here to a hard-coded value to make sure that it is non-null when the FE tests are running. The actual value doesn't matter as far as the FE tests are concerned.



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Apr 2018 09:12:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 11:

> (1 comment)
 > 
 > Yeah I agree with Phil that doing JNI to java to unzip a file may
 > be the least of all evils - we wouldn't be adding another
 > dependency and there's no risk of DOSing the NameNode. I doubt it
 > would take the NameNode down, but it could disrupt other things
 > happening on the cluster unnecessarily.

Changed the patch to take the shared timezone db in a zip archive.


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 11
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 Jun 2018 16:12:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 19: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 19
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Jun 2018 17:46:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 14:

> Uploaded patch set 14.

Added one more BE test for extracting files from a zip archive to a non-writable destination directory.

Fixed zip-slip vulnerability in ZipUtil.java.


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 14
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 08 Jun 2018 13:43:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 11:

(2 comments)

Added some tests for extracting files from a non-existent zip archive and from a corrupt zip archive.

http://gerrit.cloudera.org:8080/#/c/9986/11/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/9986/11/be/src/exprs/timezone_db.cc@198
PS11, Line 198: GetNextDirectoryEntry
> This is subjective, but I do not like this interface too much. I would pref
Done


http://gerrit.cloudera.org:8080/#/c/9986/11/be/src/exprs/timezone_db.cc@213
PS11, Line 213: readdir_r
> There was a discussion about readdir_r() vs readdir() in https://gerrit.clo
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 11
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 07 Jun 2018 19:49:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.h
File be/src/exprs/timezone_db.h:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.h@31
PS2, Line 31: class TimezoneDatabase {
My general feeling about this class is that a bit more transparency would add great value. What I mean is that some additional class level comments would really help e.g. what is the exact format of the inputs to this class, what sources it can have, in nutshell what transformation is done on that data, and how it uses the processed data later on.

What do you think?


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.cc@420
PS2, Line 420: /* = nullptr */
drop this comment


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.cc@495
PS2, Line 495: ZONEINFO_DIR
nit: ZONE_INFO_DIR?


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/runtime/timestamp-value.h@101
PS2, Line 101:   static TimestampValue FromUnixTime(time_t unix_time, const cctz::time_zone* local_tz) {
Do you think that mentioning the new param for these functions would add extra value?



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 20 Apr 2018 12:00:42 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................

IMPALA-3307: Add support for IANA time-zone db

Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.

- Introduces a new startup flag (--hdfs_zone_info_zip) to impalad to
  specify an HDFS/S3/ADLS path to a zip archive that contains the
  shared compiled IANA time-zone database. If the startup flag is set,
  impalad will use the specified time-zone database. Otherwise,
  impalad will use the default /usr/share/zoneinfo time-zone database.

- Introduces a new startup flag (--hdfs_zone_abbrev_conf) to impalad
  to specify an HDFS/S3/ADLS path to a shared config file that
  contains definitions for non-standard time-zone abbreviations.

- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.

- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.

- Adds a new ZipUtil class to extract files from a zip archive. The
  implementation is not vulnerable to Zip Slip.

Cherry-picks: not for 2.x.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/common/init.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
A be/src/exprs/timezone_db-test.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/frontend.cc
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/util/CMakeLists.txt
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/hdfs-util-test.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M be/src/util/time-test.cc
M be/src/util/time.cc
M be/src/util/time.h
A be/src/util/zip-util-test.cc
A be/src/util/zip-util.cc
A be/src/util/zip-util.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/rat_exclude_files.txt
A cmake_modules/FindCctz.cmake
M common/thrift/CMakeLists.txt
M common/thrift/ImpalaInternalService.thrift
A common/thrift/Zip.thrift
M common/thrift/metrics.json
A fe/src/main/java/org/apache/impala/util/ZipUtil.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/bin/create-load-data.sh
M testdata/data/timezoneverification.csv
A testdata/tzdb/2017c-corrupt.zip
A testdata/tzdb/2017c.zip
A testdata/tzdb/abbrev.conf
A testdata/tzdb_tiny/America/New_York
A testdata/tzdb_tiny/Etc/GMT+4
A testdata/tzdb_tiny/US/Eastern
A testdata/tzdb_tiny/UTC
A testdata/tzdb_tiny/Zulu
A testdata/tzdb_tiny/posix/UTC
A testdata/tzdb_tiny/posixrules
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
A tests/custom_cluster/test_shared_tzdb.py
D tests/query_test/test_timezones.py
72 files changed, 3,119 insertions(+), 1,167 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/15
-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 15
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 21: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2723/


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 21
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Jun 2018 20:14:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9986/10/tests/custom_cluster/test_shared_tzdb.py
File tests/custom_cluster/test_shared_tzdb.py:

http://gerrit.cloudera.org:8080/#/c/9986/10/tests/custom_cluster/test_shared_tzdb.py@59
PS10, Line 59:     for abbrev in ['PST', 'JST', 'ACT', 'VST']:
I have copied and extended this test in https://gerrit.cloudera.org/#/c/10486/3..4/tests/query_test/test_timezones.py to also checks the warning and the to_utc_timestamp function - maybe these check could be added to this test too.



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 04 Jun 2018 17:59:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 19:

Thanks for running those tests!


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 19
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Jun 2018 17:47:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 22:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2726/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 22
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Jun 2018 09:53:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 3:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/exprs/timestamp-functions.cc@95
PS3, Line 95:   time_t unix_time;
            :   if (UNLIKELY(!ts_value.UtcToUnixTime(&unix_time))) return TimestampVal::null();
            :   cctz::time_point<cctz::sys_seconds> from_tp = UnixSecondsToTimePoint(unix_time);
            : 
            :   // Convert 'from_tp' time_point to civil_second assuming 'timezone' time-zone.
            :   cctz::civil_second to_cs = cctz::convert(from_tp, *timezone);
            : 
            :   if (UNLIKELY(CheckIfDateOutOfRange(cctz::civil_day(to_cs)))) {
            :     const string& msg = Substitute(
            :         "Timestamp '$0' did not convert to a valid local time in timezone '$1'",
            :         ts_value.ToString(), tz_string_value.DebugString());
            :     context->AddWarning(msg.c_str());
            :     return TimestampVal::null();
            :   }
            : 
            :   // Note that 'to_cs' has second granularity. Since time-zone rules do not affect
            :   // fractional seconds, the fractional second part of the returned TimestampVal should be
            :   // equal to ts_value.time().fractional_seconds().
            :   return CivilSecondToTimestampVal(to_cs, ts_value.time().fractional_seconds());
> This logic is the same as TimestampValue::UtcToLocal(), plus warning if the
Done


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/exprs/timestamp-functions.cc@144
PS3, Line 144:   cctz::time_point<cctz::sys_seconds> from_tp = from_cl.pre;
             : 
             :   // Convert 'from_tp' time_point to civil_second assuming 'UTC' time-zone.
             :   cctz::civil_second to_cs = cctz::convert(from_tp, TimezoneDatabase::GetUtcTimezone());
             : 
             :   if (UNLIKELY(CheckIfDateOutOfRange(cctz::civil_day(to_cs)))) {
             :     const string& msg =
             :         Substitute("Timestamp '$0' in timezone '$1' could not be converted to UTC",
             :             ts_value.ToString(), tz_string_value.DebugString());
             :     context->AddWarning(msg.c_str());
             :     return TimestampVal::null();
             :   }
             : 
             :   // Note that 'to_cs' has second granularity. Since time-zone rules do not affect
             :   // fractional seconds, the fractional second part of the returned TimestampVal should be
             :   // equal to ts_value.time().fractional_seconds().
             :   return CivilSecondToTimestampVal(to_cs, ts_value.time().fractional_seconds());
> Similarly to my comment at line 113, this logic could be moved to a Timesta
Done


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/runtime/timestamp-value.cc@117
PS3, Line 117:       time_ = boost::posix_time::time_duration(to_cs.hour(), to_cs.minute(), to_cs.second(),
> nit: long line
Done


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h
File be/src/util/filesystem-util.h:

http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h@66
PS3, Line 66: iff
> typo: if
"iff" stands for "if and only if".


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h@70
PS3, Line 70: iff
> typo: if
Same as above


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h@71
PS3, Line 71: path
> Maybe writing "string" instead of "path" express better that no file system
Done


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h@79
PS3, Line 79: path
> Same as line 71.
Done


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/time.cc
File be/src/util/time.cc:

http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/time.cc@165
PS3, Line 165:  
> nit: extra space
Done


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/time.cc@168
PS3, Line 168:  
> nit: extra space
Done


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/time.cc@171
PS3, Line 171:  
> nit: extra space
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 08 May 2018 19:35:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 21: Code-Review+2

Carry +2


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 21
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Jun 2018 17:34:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 19:

> Are we ready to go ahead and merge? Would be good to run exhaustive
 > tests + ASAN before merging just to be sure we aren't going to
 > break anything.

Exhaustive and ASAN failed because of a flaky test (IMPALA-7181).
Exhaustive: https://master-02.jenkins.cloudera.com/job/impala-private-parameterized/2336/
ASAN: https://master-02.jenkins.cloudera.com/job/impala-private-parameterized/2337/

Since IMPALA-7181 was resolved yesterday, I've rebased the patch-set and restarted the tests this morning.
Exhaustive: https://master-02.jenkins.cloudera.com/job/impala-private-parameterized/2345/
ASAN: https://master-02.jenkins.cloudera.com/job/impala-private-parameterized/2346/

Hopefully they will pass this time.


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 19
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Jun 2018 14:00:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................

IMPALA-3307: Add support for IANA time-zone db

Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.
- Introduces a new startup flag (--hdfs_zone_info_zip) to impalad to
  specify an HDFS/S3/ADLS path to a zip archive that contains the
  shared compiled IANA time-zone database. If the startup flag is set,
  impalad will use the specified time-zone database. Otherwise,
  impalad will use the default /usr/share/zoneinfo time-zone database.
- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.
- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.
- Introduces a new startup flag (--hdfs_zone_abbrev_conf) to impalad
  to specify an HDFS/S3/ADLS path to a shared config file that
  contains definitions for non-standard time-zone abbreviations.

Cherry-picks: not for 2.x.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/common/init.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
A be/src/exprs/timezone_db-test.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/frontend.cc
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/util/CMakeLists.txt
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/hdfs-util-test.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M be/src/util/time-test.cc
M be/src/util/time.cc
M be/src/util/time.h
A be/src/util/zip-util-test.cc
A be/src/util/zip-util.cc
A be/src/util/zip-util.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/rat_exclude_files.txt
A cmake_modules/FindCctz.cmake
M common/thrift/CMakeLists.txt
M common/thrift/ImpalaInternalService.thrift
A common/thrift/Zip.thrift
M common/thrift/metrics.json
A fe/src/main/java/org/apache/impala/util/ZipUtil.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/bin/create-load-data.sh
M testdata/data/timezoneverification.csv
A testdata/tzdb/2017c-corrupt.zip
A testdata/tzdb/2017c.zip
A testdata/tzdb/abbrev.conf
A testdata/tzdb_tiny/America/New_York
A testdata/tzdb_tiny/Etc/GMT+4
A testdata/tzdb_tiny/US/Eastern
A testdata/tzdb_tiny/UTC
A testdata/tzdb_tiny/Zulu
A testdata/tzdb_tiny/posix/UTC
A testdata/tzdb_tiny/posixrules
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
A tests/custom_cluster/test_shared_tzdb.py
D tests/query_test/test_timezones.py
72 files changed, 3,098 insertions(+), 1,167 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/13
-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 13
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 20: Code-Review+2

Exhaustive, ASAN tests passed. Rebased the patch-set. Carry +2


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 20
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Jun 2018 12:31:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 22: Code-Review+2

Carry +2


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 22
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Jun 2018 09:52:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has uploaded a new patch set (#19). ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................

IMPALA-3307: Add support for IANA time-zone db

Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.

- Introduces a new startup flag (--hdfs_zone_info_zip) to impalad to
  specify an HDFS/S3/ADLS path to a zip archive that contains the
  shared compiled IANA time-zone database. If the startup flag is set,
  impalad will use the specified time-zone database. Otherwise,
  impalad will use the default /usr/share/zoneinfo time-zone database.

- Introduces a new startup flag (--hdfs_zone_alias_conf) to impalad to
  specify an HDFS/S3/ADLS path to a shared config file that contains
  definitions for non-standard time-zone aliases.

- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.

- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.

- Adds a new ZipUtil class to extract files from a zip archive. The
  implementation is not vulnerable to Zip Slip.

Cherry-picks: not for 2.x.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/common/init.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
A be/src/exprs/timezone_db-test.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/frontend.cc
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/util/CMakeLists.txt
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/hdfs-util-test.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M be/src/util/time-test.cc
M be/src/util/time.cc
M be/src/util/time.h
A be/src/util/zip-util-test.cc
A be/src/util/zip-util.cc
A be/src/util/zip-util.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/rat_exclude_files.txt
A cmake_modules/FindCctz.cmake
M common/thrift/CMakeLists.txt
M common/thrift/ImpalaInternalService.thrift
A common/thrift/Zip.thrift
M common/thrift/metrics.json
A fe/src/main/java/org/apache/impala/util/ZipUtil.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/bin/create-load-data.sh
M testdata/data/timezoneverification.csv
A testdata/tzdb/2017c-corrupt.zip
A testdata/tzdb/2017c.zip
A testdata/tzdb/alias.conf
A testdata/tzdb_tiny/America/New_York
A testdata/tzdb_tiny/Etc/GMT+4
A testdata/tzdb_tiny/US/Eastern
A testdata/tzdb_tiny/UTC
A testdata/tzdb_tiny/Zulu
A testdata/tzdb_tiny/posix/UTC
A testdata/tzdb_tiny/posixrules
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
A tests/custom_cluster/test_shared_tzdb.py
D tests/query_test/test_timezones.py
72 files changed, 3,088 insertions(+), 1,167 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/19
-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 19
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9986/5/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/9986/5/be/src/runtime/timestamp-value.cc@139
PS5, Line 139:   // In case the resulting 'time_point' is ambiguous, we have to invalidate
             :   // TimestampValue.
             :   // 'civil_lookup' members and the details of handling ambiguity are described at:
             :   // https://github.com/google/cctz/blob/a2dd3d0fbc811fe0a1d4d2dbb0341f1a3d28cb2a/
             :   // include/cctz/time_zone.h#L106
             :   if (UNLIKELY(from_cl.kind != cctz::time_zone::civil_lookup::UNIQUE)
I have investigated a bit about this:

- there is a Jira that complains about this behavior: https://issues.apache.org/jira/browse/IMPALA-3169

- Hive does not work like this, it returns a "valid" timestamp for repeated/skipped hours:

select
 to_utc_timestamp(cast("2011-03-13 02:15:00" as timestamp), "America/Los_Angeles"), 
 to_utc_timestamp(cast("2011-11-06 01:15:00" as timestamp), "America/Los_Angeles")
result: 2011-03-13 10:15:00.0	2011-11-06 09:15:00.0

I think that we should do the same, at least  for repeated values. I can imagine several valid queries where this would be the correct behavior, for example when we filter for a time interval.

So I vote for solving IMPALA-3169 in this patch by choosing pre or post time in non UNIQUE cases too. If there are no test cases yet for skipped/repeated hours, then we should create some and expect the same results that Hive returns.



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 5
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 11 May 2018 14:04:02 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 4: Code-Review+1

(6 comments)

Few nits otherwise looks good to me. The LocalToUtc performance part is optional - as it does not affect other parts of the code, it can be easily done later when general structure is already accepted by other reviewers.

http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions-ir.cc@526
PS4, Line 526:   const string& tz_name = (start_lookup.abbr != nullptr) ? start_lookup.abbr :
             :       context->impl()->state()->local_time_zone()->name();
What is the goal of this logic? To print timezone abbreviations instead of the full names, or to distinguish between summer/winter time, or both? A comment would be nice, and maybe the logic could be moved to a TimestampValue member function like string GetTimezoneName(Timezone*).


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc@93
PS4, Line 93: inline bool CheckIfDateOutOfRange(const cctz::civil_day& date) {
            :   static const cctz::civil_day max_date(TimestampFunctions::MAX_YEAR, 12, 31);
            :   static const cctz::civil_day min_date(TimestampFunctions::MIN_YEAR, 1, 1);
            :   return date < min_date || date > max_date;
            : }
This could be simpler and possibly faster by expecting a cctz::civil_second argument and check if 1400<=year<10000.


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc@128
PS4, Line 128: 
cctz explains pretty well the handling of dst boundaries, maybe we could add a link to it, for example to https://github.com/google/cctz/blob/a2dd3d0fbc811fe0a1d4d2dbb0341f1a3d28cb2a/include/cctz/time_zone.h#L147


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc@129
PS4, Line 129:   // In case of ambiguity invalidate TimestampValue.
             :   const cctz::time_zone::civil_lookup from_cl = local_tz->lookup(from_cs);
             :   if (UNLIKELY(from_cl.kind != cctz::time_zone::civil_lookup::UNIQUE)) {
             :     SetToInvalidDateTime();
             :   } else {
             :     cctz::time_point<cctz::sys_seconds> from_tp = from_cl.pre;
             : 
             :     // Convert 'from_tp' time_point to civil_second assuming 'UTC' time-zone.
             :     cctz::civil_second to_cs = cctz::convert(from_tp, TimezoneDatabase::GetUtcTimezone());
             : 
             :     // boost::gregorian::date() throws boost::gregorian::bad_year if year is not in the
             :     // 1400..9999 range. Need to check validity before creating the date object.
             :     if (UNLIKELY(CheckIfDateOutOfRange(cctz::civil_day(to_cs)))) {
I may be possible to get TimestampValue from cctz::time_zone::civil_lookup in a faster way - splitting from_tp to a day part (since a constant date) and the remainder seconds part is enough for us and should be faster then getting cctz::civil_second (which contains year/month/day split).

The code could look something like this:
int64 secs_since_base = from_tp - BASETIME_AS_CCTZ_SYS_SEC;
time_=sec_since_base%(24*60*60)+time_.fractional_seconds();
int32 days_since_base = sec_since_base/(24*60*60);
if(out_of_range(days_since_base)) SetToInvalidDateTime();
date_ = days_since_base - BASEDATE_AS_BOOST_GREG_DATE;


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc@146
PS4, Line 146:       time_ = boost::posix_time::time_duration(to_cs.hour(), to_cs.minute(), to_cs.second(),
nit: long line


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h
File be/src/util/filesystem-util.h:

http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h@66
PS3, Line 66: iff
> "iff" stands for "if and only if".
Thanks for the explanation!



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 4
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 09 May 2018 11:47:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................

IMPALA-3307: Add support for IANA time-zone db

Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.
- Introduces a new startup flag (--hdfs_zone_info_dir) to impalad to
  specify an HDFS/S3/ADLS location that contains the shared compiled
  IANA time-zone database. If the startup flag is set, impalad will
  use the specified time-zone database. Otherwise, impalad will use
  the default /usr/share/zoneinfo time-zone database.
- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.
- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.
- Introduces a new startup flag (--hdfs_zone_abbrev_conf) to impalad
  to specify an HDFS/S3/ADLS path to a shared config file that
  contains definitions for non-standard time-zone abbreviations.

Cherry-picks: not for 2.x.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
A be/src/exprs/timezone_db-test.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/hdfs-util-test.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M be/src/util/time-test.cc
M be/src/util/time.cc
M be/src/util/time.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindCctz.cmake
M common/thrift/ImpalaInternalService.thrift
M common/thrift/metrics.json
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/bin/create-load-data.sh
M testdata/data/timezoneverification.csv
A testdata/tzdb/2017c/Africa/Abidjan
A testdata/tzdb/2017c/Africa/Accra
A testdata/tzdb/2017c/Africa/Addis_Ababa
A testdata/tzdb/2017c/Africa/Algiers
A testdata/tzdb/2017c/Africa/Asmara
A testdata/tzdb/2017c/Africa/Asmera
A testdata/tzdb/2017c/Africa/Bamako
A testdata/tzdb/2017c/Africa/Bangui
A testdata/tzdb/2017c/Africa/Banjul
A testdata/tzdb/2017c/Africa/Bissau
A testdata/tzdb/2017c/Africa/Blantyre
A testdata/tzdb/2017c/Africa/Brazzaville
A testdata/tzdb/2017c/Africa/Bujumbura
A testdata/tzdb/2017c/Africa/Cairo
A testdata/tzdb/2017c/Africa/Casablanca
A testdata/tzdb/2017c/Africa/Ceuta
A testdata/tzdb/2017c/Africa/Conakry
A testdata/tzdb/2017c/Africa/Dakar
A testdata/tzdb/2017c/Africa/Dar_es_Salaam
A testdata/tzdb/2017c/Africa/Djibouti
A testdata/tzdb/2017c/Africa/Douala
A testdata/tzdb/2017c/Africa/El_Aaiun
A testdata/tzdb/2017c/Africa/Freetown
A testdata/tzdb/2017c/Africa/Gaborone
A testdata/tzdb/2017c/Africa/Harare
A testdata/tzdb/2017c/Africa/Johannesburg
A testdata/tzdb/2017c/Africa/Juba
A testdata/tzdb/2017c/Africa/Kampala
A testdata/tzdb/2017c/Africa/Khartoum
A testdata/tzdb/2017c/Africa/Kigali
A testdata/tzdb/2017c/Africa/Kinshasa
A testdata/tzdb/2017c/Africa/Lagos
A testdata/tzdb/2017c/Africa/Libreville
A testdata/tzdb/2017c/Africa/Lome
A testdata/tzdb/2017c/Africa/Luanda
A testdata/tzdb/2017c/Africa/Lubumbashi
A testdata/tzdb/2017c/Africa/Lusaka
A testdata/tzdb/2017c/Africa/Malabo
A testdata/tzdb/2017c/Africa/Maputo
A testdata/tzdb/2017c/Africa/Maseru
A testdata/tzdb/2017c/Africa/Mbabane
A testdata/tzdb/2017c/Africa/Mogadishu
A testdata/tzdb/2017c/Africa/Monrovia
A testdata/tzdb/2017c/Africa/Nairobi
A testdata/tzdb/2017c/Africa/Ndjamena
A testdata/tzdb/2017c/Africa/Niamey
A testdata/tzdb/2017c/Africa/Nouakchott
A testdata/tzdb/2017c/Africa/Ouagadougou
A testdata/tzdb/2017c/Africa/Porto-Novo
A testdata/tzdb/2017c/Africa/Sao_Tome
A testdata/tzdb/2017c/Africa/Timbuktu
A testdata/tzdb/2017c/Africa/Tripoli
A testdata/tzdb/2017c/Africa/Tunis
A testdata/tzdb/2017c/Africa/Windhoek
A testdata/tzdb/2017c/America/Adak
A testdata/tzdb/2017c/America/Anchorage
A testdata/tzdb/2017c/America/Anguilla
A testdata/tzdb/2017c/America/Antigua
A testdata/tzdb/2017c/America/Araguaina
A testdata/tzdb/2017c/America/Argentina/Buenos_Aires
A testdata/tzdb/2017c/America/Argentina/Catamarca
A testdata/tzdb/2017c/America/Argentina/ComodRivadavia
A testdata/tzdb/2017c/America/Argentina/Cordoba
A testdata/tzdb/2017c/America/Argentina/Jujuy
A testdata/tzdb/2017c/America/Argentina/La_Rioja
A testdata/tzdb/2017c/America/Argentina/Mendoza
A testdata/tzdb/2017c/America/Argentina/Rio_Gallegos
A testdata/tzdb/2017c/America/Argentina/Salta
A testdata/tzdb/2017c/America/Argentina/San_Juan
A testdata/tzdb/2017c/America/Argentina/San_Luis
A testdata/tzdb/2017c/America/Argentina/Tucuman
A testdata/tzdb/2017c/America/Argentina/Ushuaia
A testdata/tzdb/2017c/America/Aruba
A testdata/tzdb/2017c/America/Asuncion
A testdata/tzdb/2017c/America/Atikokan
A testdata/tzdb/2017c/America/Atka
A testdata/tzdb/2017c/America/Bahia
A testdata/tzdb/2017c/America/Bahia_Banderas
A testdata/tzdb/2017c/America/Barbados
A testdata/tzdb/2017c/America/Belem
A testdata/tzdb/2017c/America/Belize
A testdata/tzdb/2017c/America/Blanc-Sablon
A testdata/tzdb/2017c/America/Boa_Vista
A testdata/tzdb/2017c/America/Bogota
A testdata/tzdb/2017c/America/Boise
A testdata/tzdb/2017c/America/Buenos_Aires
A testdata/tzdb/2017c/America/Cambridge_Bay
A testdata/tzdb/2017c/America/Campo_Grande
A testdata/tzdb/2017c/America/Cancun
A testdata/tzdb/2017c/America/Caracas
A testdata/tzdb/2017c/America/Catamarca
A testdata/tzdb/2017c/America/Cayenne
A testdata/tzdb/2017c/America/Cayman
A testdata/tzdb/2017c/America/Chicago
A testdata/tzdb/2017c/America/Chihuahua
A testdata/tzdb/2017c/America/Coral_Harbour
A testdata/tzdb/2017c/America/Cordoba
A testdata/tzdb/2017c/America/Costa_Rica
A testdata/tzdb/2017c/America/Creston
A testdata/tzdb/2017c/America/Cuiaba
A testdata/tzdb/2017c/America/Curacao
A testdata/tzdb/2017c/America/Danmarkshavn
A testdata/tzdb/2017c/America/Dawson
A testdata/tzdb/2017c/America/Dawson_Creek
A testdata/tzdb/2017c/America/Denver
A testdata/tzdb/2017c/America/Detroit
A testdata/tzdb/2017c/America/Dominica
A testdata/tzdb/2017c/America/Edmonton
A testdata/tzdb/2017c/America/Eirunepe
A testdata/tzdb/2017c/America/El_Salvador
A testdata/tzdb/2017c/America/Ensenada
A testdata/tzdb/2017c/America/Fort_Nelson
A testdata/tzdb/2017c/America/Fort_Wayne
A testdata/tzdb/2017c/America/Fortaleza
A testdata/tzdb/2017c/America/Glace_Bay
A testdata/tzdb/2017c/America/Godthab
A testdata/tzdb/2017c/America/Goose_Bay
A testdata/tzdb/2017c/America/Grand_Turk
A testdata/tzdb/2017c/America/Grenada
A testdata/tzdb/2017c/America/Guadeloupe
A testdata/tzdb/2017c/America/Guatemala
A testdata/tzdb/2017c/America/Guayaquil
A testdata/tzdb/2017c/America/Guyana
A testdata/tzdb/2017c/America/Halifax
A testdata/tzdb/2017c/America/Havana
A testdata/tzdb/2017c/America/Hermosillo
A testdata/tzdb/2017c/America/Indiana/Indianapolis
A testdata/tzdb/2017c/America/Indiana/Knox
A testdata/tzdb/2017c/America/Indiana/Marengo
A testdata/tzdb/2017c/America/Indiana/Petersburg
A testdata/tzdb/2017c/America/Indiana/Tell_City
A testdata/tzdb/2017c/America/Indiana/Vevay
A testdata/tzdb/2017c/America/Indiana/Vincennes
A testdata/tzdb/2017c/America/Indiana/Winamac
A testdata/tzdb/2017c/America/Indianapolis
A testdata/tzdb/2017c/America/Inuvik
A testdata/tzdb/2017c/America/Iqaluit
A testdata/tzdb/2017c/America/Jamaica
A testdata/tzdb/2017c/America/Jujuy
A testdata/tzdb/2017c/America/Juneau
A testdata/tzdb/2017c/America/Kentucky/Louisville
A testdata/tzdb/2017c/America/Kentucky/Monticello
A testdata/tzdb/2017c/America/Knox_IN
A testdata/tzdb/2017c/America/Kralendijk
A testdata/tzdb/2017c/America/La_Paz
A testdata/tzdb/2017c/America/Lima
A testdata/tzdb/2017c/America/Los_Angeles
A testdata/tzdb/2017c/America/Louisville
A testdata/tzdb/2017c/America/Lower_Princes
A testdata/tzdb/2017c/America/Maceio
A testdata/tzdb/2017c/America/Managua
A testdata/tzdb/2017c/America/Manaus
A testdata/tzdb/2017c/America/Marigot
A testdata/tzdb/2017c/America/Martinique
A testdata/tzdb/2017c/America/Matamoros
A testdata/tzdb/2017c/America/Mazatlan
A testdata/tzdb/2017c/America/Mendoza
A testdata/tzdb/2017c/America/Menominee
A testdata/tzdb/2017c/America/Merida
A testdata/tzdb/2017c/America/Metlakatla
A testdata/tzdb/2017c/America/Mexico_City
A testdata/tzdb/2017c/America/Miquelon
A testdata/tzdb/2017c/America/Moncton
A testdata/tzdb/2017c/America/Monterrey
A testdata/tzdb/2017c/America/Montevideo
A testdata/tzdb/2017c/America/Montreal
A testdata/tzdb/2017c/America/Montserrat
A testdata/tzdb/2017c/America/Nassau
A testdata/tzdb/2017c/America/New_York
A testdata/tzdb/2017c/America/Nipigon
A testdata/tzdb/2017c/America/Nome
A testdata/tzdb/2017c/America/Noronha
A testdata/tzdb/2017c/America/North_Dakota/Beulah
A testdata/tzdb/2017c/America/North_Dakota/Center
A testdata/tzdb/2017c/America/North_Dakota/New_Salem
A testdata/tzdb/2017c/America/Ojinaga
A testdata/tzdb/2017c/America/Panama
A testdata/tzdb/2017c/America/Pangnirtung
A testdata/tzdb/2017c/America/Paramaribo
A testdata/tzdb/2017c/America/Phoenix
A testdata/tzdb/2017c/America/Port-au-Prince
A testdata/tzdb/2017c/America/Port_of_Spain
A testdata/tzdb/2017c/America/Porto_Acre
A testdata/tzdb/2017c/America/Porto_Velho
A testdata/tzdb/2017c/America/Puerto_Rico
A testdata/tzdb/2017c/America/Punta_Arenas
A testdata/tzdb/2017c/America/Rainy_River
A testdata/tzdb/2017c/America/Rankin_Inlet
A testdata/tzdb/2017c/America/Recife
A testdata/tzdb/2017c/America/Regina
A testdata/tzdb/2017c/America/Resolute
A testdata/tzdb/2017c/America/Rio_Branco
A testdata/tzdb/2017c/America/Rosario
A testdata/tzdb/2017c/America/Santa_Isabel
A testdata/tzdb/2017c/America/Santarem
A testdata/tzdb/2017c/America/Santiago
A testdata/tzdb/2017c/America/Santo_Domingo
A testdata/tzdb/2017c/America/Sao_Paulo
A testdata/tzdb/2017c/America/Scoresbysund
A testdata/tzdb/2017c/America/Shiprock
A testdata/tzdb/2017c/America/Sitka
A testdata/tzdb/2017c/America/St_Barthelemy
A testdata/tzdb/2017c/America/St_Johns
A testdata/tzdb/2017c/America/St_Kitts
A testdata/tzdb/2017c/America/St_Lucia
A testdata/tzdb/2017c/America/St_Thomas
A testdata/tzdb/2017c/America/St_Vincent
A testdata/tzdb/2017c/America/Swift_Current
A testdata/tzdb/2017c/America/Tegucigalpa
A testdata/tzdb/2017c/America/Thule
A testdata/tzdb/2017c/America/Thunder_Bay
A testdata/tzdb/2017c/America/Tijuana
A testdata/tzdb/2017c/America/Toronto
A testdata/tzdb/2017c/America/Tortola
A testdata/tzdb/2017c/America/Vancouver
A testdata/tzdb/2017c/America/Virgin
A testdata/tzdb/2017c/America/Whitehorse
A testdata/tzdb/2017c/America/Winnipeg
A testdata/tzdb/2017c/America/Yakutat
A testdata/tzdb/2017c/America/Yellowknife
A testdata/tzdb/2017c/Antarctica/Casey
A testdata/tzdb/2017c/Antarctica/Davis
A testdata/tzdb/2017c/Antarctica/DumontDUrville
A testdata/tzdb/2017c/Antarctica/Macquarie
A testdata/tzdb/2017c/Antarctica/Mawson
A testdata/tzdb/2017c/Antarctica/McMurdo
A testdata/tzdb/2017c/Antarctica/Palmer
A testdata/tzdb/2017c/Antarctica/Rothera
A testdata/tzdb/2017c/Antarctica/South_Pole
A testdata/tzdb/2017c/Antarctica/Syowa
A testdata/tzdb/2017c/Antarctica/Troll
A testdata/tzdb/2017c/Antarctica/Vostok
A testdata/tzdb/2017c/Arctic/Longyearbyen
A testdata/tzdb/2017c/Asia/Aden
A testdata/tzdb/2017c/Asia/Almaty
A testdata/tzdb/2017c/Asia/Amman
A testdata/tzdb/2017c/Asia/Anadyr
A testdata/tzdb/2017c/Asia/Aqtau
A testdata/tzdb/2017c/Asia/Aqtobe
A testdata/tzdb/2017c/Asia/Ashgabat
A testdata/tzdb/2017c/Asia/Ashkhabad
A testdata/tzdb/2017c/Asia/Atyrau
A testdata/tzdb/2017c/Asia/Baghdad
A testdata/tzdb/2017c/Asia/Bahrain
A testdata/tzdb/2017c/Asia/Baku
A testdata/tzdb/2017c/Asia/Bangkok
A testdata/tzdb/2017c/Asia/Barnaul
A testdata/tzdb/2017c/Asia/Beirut
A testdata/tzdb/2017c/Asia/Bishkek
A testdata/tzdb/2017c/Asia/Brunei
A testdata/tzdb/2017c/Asia/Calcutta
A testdata/tzdb/2017c/Asia/Chita
A testdata/tzdb/2017c/Asia/Choibalsan
A testdata/tzdb/2017c/Asia/Chongqing
A testdata/tzdb/2017c/Asia/Chungking
A testdata/tzdb/2017c/Asia/Colombo
A testdata/tzdb/2017c/Asia/Dacca
A testdata/tzdb/2017c/Asia/Damascus
A testdata/tzdb/2017c/Asia/Dhaka
A testdata/tzdb/2017c/Asia/Dili
A testdata/tzdb/2017c/Asia/Dubai
A testdata/tzdb/2017c/Asia/Dushanbe
A testdata/tzdb/2017c/Asia/Famagusta
A testdata/tzdb/2017c/Asia/Gaza
A testdata/tzdb/2017c/Asia/Harbin
A testdata/tzdb/2017c/Asia/Hebron
A testdata/tzdb/2017c/Asia/Ho_Chi_Minh
A testdata/tzdb/2017c/Asia/Hong_Kong
A testdata/tzdb/2017c/Asia/Hovd
A testdata/tzdb/2017c/Asia/Irkutsk
A testdata/tzdb/2017c/Asia/Istanbul
A testdata/tzdb/2017c/Asia/Jakarta
A testdata/tzdb/2017c/Asia/Jayapura
A testdata/tzdb/2017c/Asia/Jerusalem
A testdata/tzdb/2017c/Asia/Kabul
A testdata/tzdb/2017c/Asia/Kamchatka
A testdata/tzdb/2017c/Asia/Karachi
A testdata/tzdb/2017c/Asia/Kashgar
A testdata/tzdb/2017c/Asia/Kathmandu
A testdata/tzdb/2017c/Asia/Katmandu
A testdata/tzdb/2017c/Asia/Khandyga
A testdata/tzdb/2017c/Asia/Kolkata
A testdata/tzdb/2017c/Asia/Krasnoyarsk
A testdata/tzdb/2017c/Asia/Kuala_Lumpur
A testdata/tzdb/2017c/Asia/Kuching
A testdata/tzdb/2017c/Asia/Kuwait
A testdata/tzdb/2017c/Asia/Macao
A testdata/tzdb/2017c/Asia/Macau
A testdata/tzdb/2017c/Asia/Magadan
A testdata/tzdb/2017c/Asia/Makassar
A testdata/tzdb/2017c/Asia/Manila
A testdata/tzdb/2017c/Asia/Muscat
A testdata/tzdb/2017c/Asia/Nicosia
A testdata/tzdb/2017c/Asia/Novokuznetsk
A testdata/tzdb/2017c/Asia/Novosibirsk
A testdata/tzdb/2017c/Asia/Omsk
A testdata/tzdb/2017c/Asia/Oral
A testdata/tzdb/2017c/Asia/Phnom_Penh
A testdata/tzdb/2017c/Asia/Pontianak
A testdata/tzdb/2017c/Asia/Pyongyang
A testdata/tzdb/2017c/Asia/Qatar
A testdata/tzdb/2017c/Asia/Qyzylorda
A testdata/tzdb/2017c/Asia/Rangoon
A testdata/tzdb/2017c/Asia/Riyadh
A testdata/tzdb/2017c/Asia/Saigon
A testdata/tzdb/2017c/Asia/Sakhalin
A testdata/tzdb/2017c/Asia/Samarkand
A testdata/tzdb/2017c/Asia/Seoul
A testdata/tzdb/2017c/Asia/Shanghai
A testdata/tzdb/2017c/Asia/Singapore
A testdata/tzdb/2017c/Asia/Srednekolymsk
A testdata/tzdb/2017c/Asia/Taipei
A testdata/tzdb/2017c/Asia/Tashkent
A testdata/tzdb/2017c/Asia/Tbilisi
A testdata/tzdb/2017c/Asia/Tehran
A testdata/tzdb/2017c/Asia/Tel_Aviv
A testdata/tzdb/2017c/Asia/Thimbu
A testdata/tzdb/2017c/Asia/Thimphu
A testdata/tzdb/2017c/Asia/Tokyo
A testdata/tzdb/2017c/Asia/Tomsk
A testdata/tzdb/2017c/Asia/Ujung_Pandang
A testdata/tzdb/2017c/Asia/Ulaanbaatar
A testdata/tzdb/2017c/Asia/Ulan_Bator
A testdata/tzdb/2017c/Asia/Urumqi
A testdata/tzdb/2017c/Asia/Ust-Nera
A testdata/tzdb/2017c/Asia/Vientiane
A testdata/tzdb/2017c/Asia/Vladivostok
A testdata/tzdb/2017c/Asia/Yakutsk
A testdata/tzdb/2017c/Asia/Yangon
A testdata/tzdb/2017c/Asia/Yekaterinburg
A testdata/tzdb/2017c/Asia/Yerevan
A testdata/tzdb/2017c/Atlantic/Azores
A testdata/tzdb/2017c/Atlantic/Bermuda
A testdata/tzdb/2017c/Atlantic/Canary
A testdata/tzdb/2017c/Atlantic/Cape_Verde
A testdata/tzdb/2017c/Atlantic/Faeroe
A testdata/tzdb/2017c/Atlantic/Faroe
A testdata/tzdb/2017c/Atlantic/Jan_Mayen
A testdata/tzdb/2017c/Atlantic/Madeira
A testdata/tzdb/2017c/Atlantic/Reykjavik
A testdata/tzdb/2017c/Atlantic/South_Georgia
A testdata/tzdb/2017c/Atlantic/St_Helena
A testdata/tzdb/2017c/Atlantic/Stanley
A testdata/tzdb/2017c/Australia/ACT
A testdata/tzdb/2017c/Australia/Adelaide
A testdata/tzdb/2017c/Australia/Brisbane
A testdata/tzdb/2017c/Australia/Broken_Hill
A testdata/tzdb/2017c/Australia/Canberra
A testdata/tzdb/2017c/Australia/Currie
A testdata/tzdb/2017c/Australia/Darwin
A testdata/tzdb/2017c/Australia/Eucla
A testdata/tzdb/2017c/Australia/Hobart
A testdata/tzdb/2017c/Australia/LHI
A testdata/tzdb/2017c/Australia/Lindeman
A testdata/tzdb/2017c/Australia/Lord_Howe
A testdata/tzdb/2017c/Australia/Melbourne
A testdata/tzdb/2017c/Australia/NSW
A testdata/tzdb/2017c/Australia/North
A testdata/tzdb/2017c/Australia/Perth
A testdata/tzdb/2017c/Australia/Queensland
A testdata/tzdb/2017c/Australia/South
A testdata/tzdb/2017c/Australia/Sydney
A testdata/tzdb/2017c/Australia/Tasmania
A testdata/tzdb/2017c/Australia/Victoria
A testdata/tzdb/2017c/Australia/West
A testdata/tzdb/2017c/Australia/Yancowinna
A testdata/tzdb/2017c/Brazil/Acre
A testdata/tzdb/2017c/Brazil/DeNoronha
A testdata/tzdb/2017c/Brazil/East
A testdata/tzdb/2017c/Brazil/West
A testdata/tzdb/2017c/CET
A testdata/tzdb/2017c/CST6CDT
A testdata/tzdb/2017c/Canada/Atlantic
A testdata/tzdb/2017c/Canada/Central
A testdata/tzdb/2017c/Canada/Eastern
A testdata/tzdb/2017c/Canada/Mountain
A testdata/tzdb/2017c/Canada/Newfoundland
A testdata/tzdb/2017c/Canada/Pacific
A testdata/tzdb/2017c/Canada/Saskatchewan
A testdata/tzdb/2017c/Canada/Yukon
A testdata/tzdb/2017c/Chile/Continental
A testdata/tzdb/2017c/Chile/EasterIsland
A testdata/tzdb/2017c/Cuba
A testdata/tzdb/2017c/EET
A testdata/tzdb/2017c/EST
A testdata/tzdb/2017c/EST5EDT
A testdata/tzdb/2017c/Egypt
A testdata/tzdb/2017c/Eire
A testdata/tzdb/2017c/Etc/GMT
A testdata/tzdb/2017c/Etc/GMT+0
A testdata/tzdb/2017c/Etc/GMT+1
A testdata/tzdb/2017c/Etc/GMT+10
A testdata/tzdb/2017c/Etc/GMT+11
A testdata/tzdb/2017c/Etc/GMT+12
A testdata/tzdb/2017c/Etc/GMT+2
A testdata/tzdb/2017c/Etc/GMT+3
A testdata/tzdb/2017c/Etc/GMT+4
A testdata/tzdb/2017c/Etc/GMT+5
A testdata/tzdb/2017c/Etc/GMT+6
A testdata/tzdb/2017c/Etc/GMT+7
A testdata/tzdb/2017c/Etc/GMT+8
A testdata/tzdb/2017c/Etc/GMT+9
A testdata/tzdb/2017c/Etc/GMT-0
A testdata/tzdb/2017c/Etc/GMT-1
A testdata/tzdb/2017c/Etc/GMT-10
A testdata/tzdb/2017c/Etc/GMT-11
A testdata/tzdb/2017c/Etc/GMT-12
A testdata/tzdb/2017c/Etc/GMT-13
A testdata/tzdb/2017c/Etc/GMT-14
A testdata/tzdb/2017c/Etc/GMT-2
A testdata/tzdb/2017c/Etc/GMT-3
A testdata/tzdb/2017c/Etc/GMT-4
A testdata/tzdb/2017c/Etc/GMT-5
A testdata/tzdb/2017c/Etc/GMT-6
A testdata/tzdb/2017c/Etc/GMT-7
A testdata/tzdb/2017c/Etc/GMT-8
A testdata/tzdb/2017c/Etc/GMT-9
A testdata/tzdb/2017c/Etc/GMT0
A testdata/tzdb/2017c/Etc/Greenwich
A testdata/tzdb/2017c/Etc/UCT
A testdata/tzdb/2017c/Etc/UTC
A testdata/tzdb/2017c/Etc/Universal
A testdata/tzdb/2017c/Etc/Zulu
A testdata/tzdb/2017c/Europe/Amsterdam
A testdata/tzdb/2017c/Europe/Andorra
A testdata/tzdb/2017c/Europe/Astrakhan
A testdata/tzdb/2017c/Europe/Athens
A testdata/tzdb/2017c/Europe/Belfast
A testdata/tzdb/2017c/Europe/Belgrade
A testdata/tzdb/2017c/Europe/Berlin
A testdata/tzdb/2017c/Europe/Bratislava
A testdata/tzdb/2017c/Europe/Brussels
A testdata/tzdb/2017c/Europe/Bucharest
A testdata/tzdb/2017c/Europe/Budapest
A testdata/tzdb/2017c/Europe/Busingen
A testdata/tzdb/2017c/Europe/Chisinau
A testdata/tzdb/2017c/Europe/Copenhagen
A testdata/tzdb/2017c/Europe/Dublin
A testdata/tzdb/2017c/Europe/Gibraltar
A testdata/tzdb/2017c/Europe/Guernsey
A testdata/tzdb/2017c/Europe/Helsinki
A testdata/tzdb/2017c/Europe/Isle_of_Man
A testdata/tzdb/2017c/Europe/Istanbul
A testdata/tzdb/2017c/Europe/Jersey
A testdata/tzdb/2017c/Europe/Kaliningrad
A testdata/tzdb/2017c/Europe/Kiev
A testdata/tzdb/2017c/Europe/Kirov
A testdata/tzdb/2017c/Europe/Lisbon
A testdata/tzdb/2017c/Europe/Ljubljana
A testdata/tzdb/2017c/Europe/London
A testdata/tzdb/2017c/Europe/Luxembourg
A testdata/tzdb/2017c/Europe/Madrid
A testdata/tzdb/2017c/Europe/Malta
A testdata/tzdb/2017c/Europe/Mariehamn
A testdata/tzdb/2017c/Europe/Minsk
A testdata/tzdb/2017c/Europe/Monaco
A testdata/tzdb/2017c/Europe/Moscow
A testdata/tzdb/2017c/Europe/Nicosia
A testdata/tzdb/2017c/Europe/Oslo
A testdata/tzdb/2017c/Europe/Paris
A testdata/tzdb/2017c/Europe/Podgorica
A testdata/tzdb/2017c/Europe/Prague
A testdata/tzdb/2017c/Europe/Riga
A testdata/tzdb/2017c/Europe/Rome
A testdata/tzdb/2017c/Europe/Samara
A testdata/tzdb/2017c/Europe/San_Marino
A testdata/tzdb/2017c/Europe/Sarajevo
A testdata/tzdb/2017c/Europe/Saratov
A testdata/tzdb/2017c/Europe/Simferopol
A testdata/tzdb/2017c/Europe/Skopje
A testdata/tzdb/2017c/Europe/Sofia
A testdata/tzdb/2017c/Europe/Stockholm
A testdata/tzdb/2017c/Europe/Tallinn
A testdata/tzdb/2017c/Europe/Tirane
A testdata/tzdb/2017c/Europe/Tiraspol
A testdata/tzdb/2017c/Europe/Ulyanovsk
A testdata/tzdb/2017c/Europe/Uzhgorod
A testdata/tzdb/2017c/Europe/Vaduz
A testdata/tzdb/2017c/Europe/Vatican
A testdata/tzdb/2017c/Europe/Vienna
A testdata/tzdb/2017c/Europe/Vilnius
A testdata/tzdb/2017c/Europe/Volgograd
A testdata/tzdb/2017c/Europe/Warsaw
A testdata/tzdb/2017c/Europe/Zagreb
A testdata/tzdb/2017c/Europe/Zaporozhye
A testdata/tzdb/2017c/Europe/Zurich
A testdata/tzdb/2017c/Factory
A testdata/tzdb/2017c/GB
A testdata/tzdb/2017c/GB-Eire
A testdata/tzdb/2017c/GMT
A testdata/tzdb/2017c/GMT+0
A testdata/tzdb/2017c/GMT-0
A testdata/tzdb/2017c/GMT0
A testdata/tzdb/2017c/Greenwich
A testdata/tzdb/2017c/HST
A testdata/tzdb/2017c/Hongkong
A testdata/tzdb/2017c/Iceland
A testdata/tzdb/2017c/Indian/Antananarivo
A testdata/tzdb/2017c/Indian/Chagos
A testdata/tzdb/2017c/Indian/Christmas
A testdata/tzdb/2017c/Indian/Cocos
A testdata/tzdb/2017c/Indian/Comoro
A testdata/tzdb/2017c/Indian/Kerguelen
A testdata/tzdb/2017c/Indian/Mahe
A testdata/tzdb/2017c/Indian/Maldives
A testdata/tzdb/2017c/Indian/Mauritius
A testdata/tzdb/2017c/Indian/Mayotte
A testdata/tzdb/2017c/Indian/Reunion
A testdata/tzdb/2017c/Iran
A testdata/tzdb/2017c/Israel
A testdata/tzdb/2017c/Jamaica
A testdata/tzdb/2017c/Japan
A testdata/tzdb/2017c/Kwajalein
A testdata/tzdb/2017c/Libya
A testdata/tzdb/2017c/MET
A testdata/tzdb/2017c/MST
A testdata/tzdb/2017c/MST7MDT
A testdata/tzdb/2017c/Mexico/BajaNorte
A testdata/tzdb/2017c/Mexico/BajaSur
A testdata/tzdb/2017c/Mexico/General
A testdata/tzdb/2017c/NZ
A testdata/tzdb/2017c/NZ-CHAT
A testdata/tzdb/2017c/Navajo
A testdata/tzdb/2017c/PRC
A testdata/tzdb/2017c/PST8PDT
A testdata/tzdb/2017c/Pacific/Apia
A testdata/tzdb/2017c/Pacific/Auckland
A testdata/tzdb/2017c/Pacific/Bougainville
A testdata/tzdb/2017c/Pacific/Chatham
A testdata/tzdb/2017c/Pacific/Chuuk
A testdata/tzdb/2017c/Pacific/Easter
A testdata/tzdb/2017c/Pacific/Efate
A testdata/tzdb/2017c/Pacific/Enderbury
A testdata/tzdb/2017c/Pacific/Fakaofo
A testdata/tzdb/2017c/Pacific/Fiji
A testdata/tzdb/2017c/Pacific/Funafuti
A testdata/tzdb/2017c/Pacific/Galapagos
A testdata/tzdb/2017c/Pacific/Gambier
A testdata/tzdb/2017c/Pacific/Guadalcanal
A testdata/tzdb/2017c/Pacific/Guam
A testdata/tzdb/2017c/Pacific/Honolulu
A testdata/tzdb/2017c/Pacific/Johnston
A testdata/tzdb/2017c/Pacific/Kiritimati
A testdata/tzdb/2017c/Pacific/Kosrae
A testdata/tzdb/2017c/Pacific/Kwajalein
A testdata/tzdb/2017c/Pacific/Majuro
A testdata/tzdb/2017c/Pacific/Marquesas
A testdata/tzdb/2017c/Pacific/Midway
A testdata/tzdb/2017c/Pacific/Nauru
A testdata/tzdb/2017c/Pacific/Niue
A testdata/tzdb/2017c/Pacific/Norfolk
A testdata/tzdb/2017c/Pacific/Noumea
A testdata/tzdb/2017c/Pacific/Pago_Pago
A testdata/tzdb/2017c/Pacific/Palau
A testdata/tzdb/2017c/Pacific/Pitcairn
A testdata/tzdb/2017c/Pacific/Pohnpei
A testdata/tzdb/2017c/Pacific/Ponape
A testdata/tzdb/2017c/Pacific/Port_Moresby
A testdata/tzdb/2017c/Pacific/Rarotonga
A testdata/tzdb/2017c/Pacific/Saipan
A testdata/tzdb/2017c/Pacific/Samoa
A testdata/tzdb/2017c/Pacific/Tahiti
A testdata/tzdb/2017c/Pacific/Tarawa
A testdata/tzdb/2017c/Pacific/Tongatapu
A testdata/tzdb/2017c/Pacific/Truk
A testdata/tzdb/2017c/Pacific/Wake
A testdata/tzdb/2017c/Pacific/Wallis
A testdata/tzdb/2017c/Pacific/Yap
A testdata/tzdb/2017c/Poland
A testdata/tzdb/2017c/Portugal
A testdata/tzdb/2017c/ROC
A testdata/tzdb/2017c/ROK
A testdata/tzdb/2017c/Singapore
A testdata/tzdb/2017c/SystemV/AST4
A testdata/tzdb/2017c/SystemV/AST4ADT
A testdata/tzdb/2017c/SystemV/CST6
A testdata/tzdb/2017c/SystemV/CST6CDT
A testdata/tzdb/2017c/SystemV/EST5
A testdata/tzdb/2017c/SystemV/EST5EDT
A testdata/tzdb/2017c/SystemV/HST10
A testdata/tzdb/2017c/SystemV/MST7
A testdata/tzdb/2017c/SystemV/MST7MDT
A testdata/tzdb/2017c/SystemV/PST8
A testdata/tzdb/2017c/SystemV/PST8PDT
A testdata/tzdb/2017c/SystemV/YST9
A testdata/tzdb/2017c/SystemV/YST9YDT
A testdata/tzdb/2017c/Turkey
A testdata/tzdb/2017c/UCT
A testdata/tzdb/2017c/US/Alaska
A testdata/tzdb/2017c/US/Aleutian
A testdata/tzdb/2017c/US/Arizona
A testdata/tzdb/2017c/US/Central
A testdata/tzdb/2017c/US/East-Indiana
A testdata/tzdb/2017c/US/Eastern
A testdata/tzdb/2017c/US/Hawaii
A testdata/tzdb/2017c/US/Indiana-Starke
A testdata/tzdb/2017c/US/Michigan
A testdata/tzdb/2017c/US/Mountain
A testdata/tzdb/2017c/US/Pacific
A testdata/tzdb/2017c/US/Pacific-New
A testdata/tzdb/2017c/US/Samoa
A testdata/tzdb/2017c/UTC
A testdata/tzdb/2017c/Universal
A testdata/tzdb/2017c/W-SU
A testdata/tzdb/2017c/WET
A testdata/tzdb/2017c/Zulu
A testdata/tzdb/2017c/posixrules
A testdata/tzdb/abbrev.conf
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
A tests/custom_cluster/test_shared_tzdb.py
D tests/query_test/test_timezones.py
659 files changed, 2,892 insertions(+), 1,144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/10
-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/util/filesystem-util.h
File be/src/util/filesystem-util.h:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/util/filesystem-util.h@57
PS2, Line 57:   static Status GetRealPath(
Shouldn't you mentioned that this should be called on sym links? (or do I misunderstand something?)
I think that what a "real path" is should need some further explanation.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/util/filesystem-util.h@60
PS2, Line 60: Is it is
nit: if it is


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/util/hdfs-util.h
File be/src/util/hdfs-util.h:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/util/hdfs-util.h@59
PS2, Line 59: /// Returns basename of 'path'.
Could you add an example to the comment?


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/util/time.h
File be/src/util/time.h:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/util/time.h@123
PS2, Line 123: /// Converts input microseconds-since-epoch to date-time string in 'tz' time zone.
Could you mention 'p' as well?


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/util/time.cc
File be/src/util/time.cc:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/util/time.cc@167
PS2, Line 167:   const char* fmt = (p == TimePrecision::Millisecond) ? fmt_millisec :
For me this 3 layers of embedded ternary operators isn't that readable. What about a switch?



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 20 Apr 2018 14:28:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9986/5/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

http://gerrit.cloudera.org:8080/#/c/9986/5/be/src/runtime/runtime-state.h@317
PS5, Line 317:   /// Query-global timezone used as local timezone when executing the query.
who owns it?  Let's at least say "Not owned."


http://gerrit.cloudera.org:8080/#/c/9986/5/be/src/util/filesystem-util.h
File be/src/util/filesystem-util.h:

http://gerrit.cloudera.org:8080/#/c/9986/5/be/src/util/filesystem-util.h@58
PS5, Line 58: Real
Should we call it GetCanonicalPath()? I assume it's related to IsCanonicalPath(), and so would be nice to name them similarly (and group them together).  Is it the case that IsCanonicalPath() always returns true for the *real_path returned by GetRealPath()?



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 5
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 11 May 2018 21:07:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 10:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9986/17/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/9986/17/be/src/exprs/timezone_db.cc@367
PS17, Line 367:   while (true) {
              :     current_bytes_read = hdfsRead(hdfs_co
> nit: this could fit in one line
Done


http://gerrit.cloudera.org:8080/#/c/9986/18/be/src/util/zip-util.h
File be/src/util/zip-util.h:

http://gerrit.cloudera.org:8080/#/c/9986/18/be/src/util/zip-util.h@33
PS18, Line 33: 
> nit: comment formatting - should start with ///
Done


http://gerrit.cloudera.org:8080/#/c/9986/10/testdata/tzdb/2017c/Africa/Abidjan
File testdata/tzdb/2017c/Africa/Abidjan:

http://gerrit.cloudera.org:8080/#/c/9986/10/testdata/tzdb/2017c/Africa/Abidjan@1
PS10, Line 1: ../Atlantic/St_Helena
> How does they pass the rat checks? We run those precommit. I expected to se
I've updated rat_exclude_files.txt in patch-set #11 and #13. I ran the rat checks with that and they passed.



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 14 Jun 2018 15:27:02 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 2:

(29 comments)

Dumping another batch of comments :)

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exec/data-source-scan-node.h
File be/src/exec/data-source-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exec/data-source-scan-node.h@100
PS1, Line 100:   Status MaterializeNextRow(RuntimeState* state, MemPool* mem_pool, Tuple* tuple);
What do you think about passing cctz::time_zone* instead of RuntimeState*?

Can you add a comment for the  new parameter and it's purpose?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/expr-test.cc@6398
PS1, Line 6398: const char* local_tz_name = "PST8PDT";
              :     ScopedTimeZoneOverride time_zone(local_tz_name);
              :     const cctz::time_zone* local_tz = TimezoneDatabase::FindTimezone(local_tz_name);
              :     DCHECK(local_tz != nullptr);
Have you considered moving this to a function or macro or such? As I see you do this same thing 3 times.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@a197
PS1, Line 197: 
Nice that we can get rid of this magic :)


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@73
PS1, Line 73: from_cs
Might be just me but I don't really find the names from_cs, to_cs and from_tp too self-descriptive.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@76
PS1, Line 76: auto from_tp = cctz::convert(from_cs, TimezoneDatabase::GetUtcTimezone());
            :   auto to_cs = cctz::convert(from_tp, *timezone);
I think it would worth writing a comment why the two cctz::convert() calls are needed.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@79
PS1, Line 79:   // Check if resulting timestamp is within range
In my opinion this comment doesn't add extra value as the name of the function states the same.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@113
PS1, Line 113:   context->AddWarning(ss.str().c_str());
             :     return ts_val;
             :   }
             : 
this seems duplicate code (TimestampFunctions::FromUtc)


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@140
PS1, Line 140:   context->AddWarning(msg.c_str());
             :     return TimestampVal::null();
             :   }
             : 
             :   // Create 'return_date' and 'return_time' from 'to_cs'.
I think this conversion could go to a function as it seems duplicate for me with the same in TimestampFunctions::FromUtc.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h
File be/src/exprs/timezone_db.h:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@21
PS1, Line 21: #include <boost/unordered_map.hpp>
shouldn't we use the one from std?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@35
PS1, Line 35: 
string& GetPath() ?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@47
PS1, Line 47:   }
FindTimezone() returns pointer while GetUtcTimezone() returns reference to a timezone. Can these be unified?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@54
PS1, Line 54:   static const cctz::time_zone UTC_TIMEZONE_;
Could you add a comment what the string param is used for?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@62
PS1, Line 62:   /// location.
As I see you wrote comments for these function in the .cc file. Could you move them to the header?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@70
PS1, Line 70: db.
zone_abbrev


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@10
PS1, Line 10: //
Unrelated to your change but shouldn't we replace this to the Apache header other files use?
Same goes for the header.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@103
PS1, Line 103: // Returns 'true' if path 'a' starts with path 'b'. If 'relative' is not nullptr, it will
FileSystemUtil would be a better place for this, I think.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@105
PS1, Line 105: PathStartsWith
I have the feeling that this function should only decide if path 'b' is the prefix of path 'a'. Returning the remainder after 'b' is something that I think should be out of this functions scope.
I think this mix of responsibilities is the reason the name of the function is not that self-explanatory. It could be e.g. IsPrefixPath() or such.

What do you think?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@105
PS1, Line 105: string *relative
For me the name of this variable doesn't indicate it's purpose. Can you name it remainder or rest or something similar? Also reading the function comment didn't explain me what a path relative to 'b' means.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@108
PS1, Line 108: b.length() + 1 < a.length()
what if a==b. In theory then a starts with b, still this returns false.

As I see in this case 'a' should be 'b'+"/" to get true. If this is intentional then please mention it in the function comment.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@118
PS1, Line 118: // with an uppercase letter.
Could you mention examples here that contain the mentioned allowed chars?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@119
PS1, Line 119: bool IsTimezoneNameSegmentValid(const string& tz_seg) {
Shouldn't this be part of TimezoneDatabase or some other timezone related helper class?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@121
PS1, Line 121:       find_if(
Tricky :)

Just for my information, have you considered using regex?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@132
PS1, Line 132: // time-zone name segments delimited by '/'.
Could you mention one input example in the comment?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@133
PS1, Line 133: bool IsTimezoneNameValid(const string& tz_name) {
Shouldn't this be part of TimezoneDatabase or some other timezone related helper class?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@136
PS1, Line 136:   while (end != string::npos) {
Wouldn't it be easier to verify this with a regex and get rid of this while loop and IsTimezoneNameSegmentValid() ?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@147
PS1, Line 147: bool
can you return int64_t* and return nullptr in case the offset is not valid?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@147
PS1, Line 147: bool IsTimezoneOffsetValid(const string& tz_offset, int64_t* offset_sec) {
Shouldn't this be part of TimezoneDatabase or some other timezone related helper class?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@160
PS1, Line 160: // The implementation here was adapted from
> Maybe this could be moved to class FileSystemUtil.
I again feel here that this function serves 2 purposes instead of a clear one: decide if path is a symbolic link and get the 'real_path'. I think these should be split to multiple functions.

What do you think?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@251
PS1, Line 251:   // mkdtemp operates in place, so we need a mutable array.
Can you move the comment to the header?



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 20 Apr 2018 07:32:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 18: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9986/18/be/src/util/zip-util.h
File be/src/util/zip-util.h:

http://gerrit.cloudera.org:8080/#/c/9986/18/be/src/util/zip-util.h@33
PS18, Line 33:   //Extract files from a zip archive to a destination directory in local filesystem.
nit: comment formatting - should start with ///


http://gerrit.cloudera.org:8080/#/c/9986/10/testdata/tzdb/2017c/Africa/Abidjan
File testdata/tzdb/2017c/Africa/Abidjan:

http://gerrit.cloudera.org:8080/#/c/9986/10/testdata/tzdb/2017c/Africa/Abidjan@1
PS10, Line 1: 
> These are binary data files created from text data files that are in the pu
How does they pass the rat checks? We run those precommit. I expected to see something in bin/rat_exclude_files.txt.



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 18
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 14 Jun 2018 01:44:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................

IMPALA-3307: Add support for IANA time-zone db

Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.
- Introduces a new startup flag (--hdfs_zone_info_zip) to impalad to
  specify an HDFS/S3/ADLS path to a zip archive that contains the
  shared compiled IANA time-zone database. If the startup flag is set,
  impalad will use the specified time-zone database. Otherwise,
  impalad will use the default /usr/share/zoneinfo time-zone database.
- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.
- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.
- Introduces a new startup flag (--hdfs_zone_abbrev_conf) to impalad
  to specify an HDFS/S3/ADLS path to a shared config file that
  contains definitions for non-standard time-zone abbreviations.

Cherry-picks: not for 2.x.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/common/init.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
A be/src/exprs/timezone_db-test.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/frontend.cc
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/util/CMakeLists.txt
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/hdfs-util-test.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M be/src/util/time-test.cc
M be/src/util/time.cc
M be/src/util/time.h
A be/src/util/zip-util-test.cc
A be/src/util/zip-util.cc
A be/src/util/zip-util.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/rat_exclude_files.txt
A cmake_modules/FindCctz.cmake
M common/thrift/CMakeLists.txt
M common/thrift/ImpalaInternalService.thrift
A common/thrift/Zip.thrift
M common/thrift/metrics.json
A fe/src/main/java/org/apache/impala/util/ZipUtil.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/bin/create-load-data.sh
M testdata/data/timezoneverification.csv
A testdata/tzdb/2017c-corrupt.zip
A testdata/tzdb/2017c.zip
A testdata/tzdb/abbrev.conf
A testdata/tzdb_tiny/America/New_York
A testdata/tzdb_tiny/Etc/GMT+4
A testdata/tzdb_tiny/US/Eastern
A testdata/tzdb_tiny/UTC
A testdata/tzdb_tiny/Zulu
A testdata/tzdb_tiny/posix/UTC
A testdata/tzdb_tiny/posixrules
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
A tests/custom_cluster/test_shared_tzdb.py
D tests/query_test/test_timezones.py
72 files changed, 3,117 insertions(+), 1,167 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/14
-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 14
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 1:

(7 comments)

Spent only a limited amount of time on this review. Will continue next week.

http://gerrit.cloudera.org:8080/#/c/9986/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9986/1//COMMIT_MSG@7
PS1, Line 7: time-zone
Fun fact:
I was curious whether "timezone", "time zone" or "time-zone" is correct. Apparently all of them are, however I read that "time-zone" is a bit outdated.  "time zone" is widely used and "timezone" is only in the US.
Can someone native confirm? :)


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc
File be/src/benchmarks/convert-timestamp-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@1
PS1, Line 1: #include <chrono>
Missing Apache header


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@38
PS1, Line 38: UtcToUnixTime:             Function  iters/ms   10%ile   50%ile   90%ile     10%ile     50%ile     90%ile
Do you think we should force the 90col limit on the following comment as well?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@136
PS1, Line 136:     ss << to_simple_string(start);
to_simple_string() return a string already, no need to use a stringstream for this purpose.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/decimal-operators.h
File be/src/exprs/decimal-operators.h:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/decimal-operators.h@168
PS1, Line 168:   /// instead of truncating if 'round' is true.
Should we mention the new parameter in the comment?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/runtime-state.h@321
PS1, Line 321: global local
I see the intent but "global local timezone" sounds strange :)


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/runtime-state.h@321
PS1, Line 321: -
nit: typo



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Apr 2018 12:20:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 11:

(2 comments)

Thanks for adding zip support!
We should add some tests for zip_util, especially for error handling, which is an untested path at the moment if didn't miss something. I am ok with moving this (and dealing with my other comments) to a later commit.

http://gerrit.cloudera.org:8080/#/c/9986/11/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/9986/11/be/src/exprs/timezone_db.cc@198
PS11, Line 198: GetNextDirectoryEntry
This is subjective, but I do not like this interface too much. I would prefer to wrap dir_stream to a class/struct, or create a function like this: static STATUS ListDirEntries(string path, vector<string>& result, int max_result_num = 0). Both could be moved to util/filesystem_util.


http://gerrit.cloudera.org:8080/#/c/9986/11/be/src/exprs/timezone_db.cc@213
PS11, Line 213: readdir_r
There was a discussion about readdir_r() vs readdir() in https://gerrit.cloudera.org/#/c/8546/8 , and readdir() was preferred in the end.



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 11
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Jun 2018 18:35:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, Zoltan Borok-Nagy, Csaba Ringhofer, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9986

to look at the new patch set (#6).

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................

IMPALA-3307: Add support for IANA time-zone db

Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.
- Introduces a new startup flag (--hdfs_zone_info_dir) to impalad to
  specify an HDFS/S3/ADLS location that contains the shared compiled
  IANA time-zone database. If the startup flag is set, impalad will
  use the specified time-zone database. Otherwise, impalad will use
  the default /usr/share/zoneinfo time-zone database.
- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.
- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.
- Introduces a new startup flag (--hdfs_zone_abbrev_conf) to impalad
  to specify an HDFS/S3/ADLS path to a shared config file that
  contains definitions for non-standard time-zone abbreviations.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
A be/src/exprs/timezone_db-test.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/hdfs-util-test.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M be/src/util/time-test.cc
M be/src/util/time.cc
M be/src/util/time.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindCctz.cmake
M common/thrift/ImpalaInternalService.thrift
M common/thrift/metrics.json
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/bin/create-load-data.sh
M testdata/data/timezoneverification.csv
A testdata/tzdb/abbrev.conf
A testdata/tzdb/zoneinfo/AmerICA/ArgeNTINA/MendOZA
A testdata/tzdb/zoneinfo/AmerICA/CancUN
A testdata/tzdb/zoneinfo/UTC
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
A tests/custom_cluster/test_custom_tzdb.py
53 files changed, 2,569 insertions(+), 1,092 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/6
-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 6
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 22: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 22
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Jun 2018 13:18:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................

IMPALA-3307: Add support for IANA time-zone db

Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.

- Introduces a new startup flag (--hdfs_zone_info_zip) to impalad to
  specify an HDFS/S3/ADLS path to a zip archive that contains the
  shared compiled IANA time-zone database. If the startup flag is set,
  impalad will use the specified time-zone database. Otherwise,
  impalad will use the default /usr/share/zoneinfo time-zone database.

- Introduces a new startup flag (--hdfs_zone_alias_conf) to impalad to
  specify an HDFS/S3/ADLS path to a shared config file that contains
  definitions for non-standard time-zone aliases.

- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.

- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.

- Adds a new ZipUtil class to extract files from a zip archive. The
  implementation is not vulnerable to Zip Slip.

Cherry-picks: not for 2.x.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/common/init.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
A be/src/exprs/timezone_db-test.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/frontend.cc
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/util/CMakeLists.txt
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/hdfs-util-test.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M be/src/util/time-test.cc
M be/src/util/time.cc
M be/src/util/time.h
A be/src/util/zip-util-test.cc
A be/src/util/zip-util.cc
A be/src/util/zip-util.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/rat_exclude_files.txt
A cmake_modules/FindCctz.cmake
M common/thrift/CMakeLists.txt
M common/thrift/ImpalaInternalService.thrift
A common/thrift/Zip.thrift
M common/thrift/metrics.json
A fe/src/main/java/org/apache/impala/util/ZipUtil.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/bin/create-load-data.sh
M testdata/data/timezoneverification.csv
A testdata/tzdb/2017c-corrupt.zip
A testdata/tzdb/2017c.zip
A testdata/tzdb/alias.conf
A testdata/tzdb_tiny/America/New_York
A testdata/tzdb_tiny/Etc/GMT+4
A testdata/tzdb_tiny/US/Eastern
A testdata/tzdb_tiny/UTC
A testdata/tzdb_tiny/Zulu
A testdata/tzdb_tiny/posix/UTC
A testdata/tzdb_tiny/posixrules
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
A tests/custom_cluster/test_shared_tzdb.py
D tests/query_test/test_timezones.py
72 files changed, 3,089 insertions(+), 1,167 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/18
-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 18
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 18: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9986/17/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/9986/17/be/src/exprs/timezone_db.cc@367
PS17, Line 367: Status TimezoneDatabase::LoadZoneAliasesFromHdfs(
              :     const string& hdfs_zone_alias_conf) {
nit: this could fit in one line



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 18
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Jun 2018 13:24:38 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 22: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2725/


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 22
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Jun 2018 09:51:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 5:

(19 comments)

Thanks for dealing with my previous comments, Attila! I filed some more but basically I'm fine with the changes and feel free to start involving someone with more experience on this topic.

http://gerrit.cloudera.org:8080/#/c/9986/4/CMakeLists.txt
File CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/9986/4/CMakeLists.txt@281
PS4, Line 281: Cctz
nit: CCTZ


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/expr-test.cc@139
PS4, Line 139: new_time_zone_(time_zone), new_tz_
From reading the names of these 2 variables it's not clear what de difference is. Can you have a new_time_zone_ and a new_time_zone_name_ or such?


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/expr-test.cc@140
PS4, Line 140: /*overwrite*/
Do you need this comment?


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/expr-test.cc@153
PS4, Line 153: Timezone *
nit: Timezone* Expect..()


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/expr-test.cc@155
PS4, Line 155: new_tz_ = TimezoneDatabase::FindTimezone(new_time_zone_);
I wonder if it makes sense to do this assignment in the constructor and then this function can be changed to something like "getTimezone" that simply returns the corresponding member.


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions-ir.cc@503
PS4, Line 503: namespace
Why did you need this namespace?


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions-ir.cc@504
PS4, Line 504: nline 
What is the plan to get rid of the "Duplicate code" TODOs in this review?


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions.cc@55
PS4, Line 55:     // This should raise some sort of error or at least return null. Hive just ignores it.
Shouldn't this be a TODO?


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions.cc@87
PS4, Line 87:     // This should raise some sort of error or at least return null. Hive just ignores it.
Same as above


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db-test.cc
File be/src/exprs/timezone_db-test.cc:

http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db-test.cc@57
PS4, Line 57: TzAbbev
nit: TzAbbrev?


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db-test.cc@68
PS4, Line 68:   // Abbreviations must start with an uppercase letter.
If it has to start with an uppercase letter, can we add a test this with an input that would be valid if the first letter was uppercase?
e.g. "pST", "pst", "singapore" etc.


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db-test.cc@105
PS4, Line 105:   // Misformatted time-zone names.
Can you again play around with upper vs lower case letters here?


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db.h
File be/src/exprs/timezone_db.h:

http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db.h@1
PS4, Line 1: // with the License.  You may obtain a copy of the License at
Hmm, is the top of the Apache comment missing?


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db.h@23
PS4, Line 23: /// 'TimezoneDatabase' class contains functions to load and access the IANA time-zone
Nice description, thx :)


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db.h@91
PS4, Line 91: tz_seg
nit: might be just my preference but tz_segment is still short and I think better to read. Up to you.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@147
PS1, Line 147:     
> Returning a pointer just to be able to signal failure with nullptr seems co
What I meant is that you could get rid of the offset_sec paramater if you changed the return value to int64_t* and you would still be able to detect failure if the return value is null. Not a big deal, though. Your choice.


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db.cc@365
PS4, Line 365:   hdfsFile hdfs_file = hdfsOpenFile(
Just for my information: LoadZoneInfoFromHdfs() copies the zone info file to a local dir, meanwhile this function uses hdfsOpenFile to get the abbrev data. Is there a reason that they don't follow the same approach?


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db.cc@391
PS4, Line 391:             ErrorMsg(TErrorCode::GENERAL,
nit: I'm not exactly sure about the rules, but I feel that this line could be merged with the previous one.


http://gerrit.cloudera.org:8080/#/c/9986/5/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

http://gerrit.cloudera.org:8080/#/c/9986/5/be/src/runtime/runtime-state.cc@136
PS5, Line 136:     local_time_zone_ = &TimezoneDatabase::GetUtcTimezone();
This has already been set to GetUtcTimezone() in the constructor, right?



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 5
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 10 May 2018 18:20:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions-ir.cc@526
PS4, Line 526:   const string& tz_name = (start_lookup.abbr != nullptr) ? start_lookup.abbr :
             :       context->impl()->state()->local_time_zone()->name();
> What is the goal of this logic? To print timezone abbreviations instead of 
Both, I guess. I just replicated the original behavior: localtime_r() sets tzone.tm_zone to the time-zone abbreviation. Added a comment.

I don't think that the TimestampValue class would be a good place for this function. The timezone abbreviation corresponds to 'start_unix_millis', which is not a TimestampValue. I'll keep this logic here for now.


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc@93
PS4, Line 93: inline bool CheckIfDateOutOfRange(const cctz::civil_day& date) {
            :   static const cctz::civil_day max_date(TimestampFunctions::MAX_YEAR, 12, 31);
            :   static const cctz::civil_day min_date(TimestampFunctions::MIN_YEAR, 1, 1);
            :   return date < min_date || date > max_date;
            : }
> This could be simpler and possibly faster by expecting a cctz::civil_second
Done


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc@128
PS4, Line 128: 
> cctz explains pretty well the handling of dst boundaries, maybe we could ad
Done


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc@129
PS4, Line 129:   // In case of ambiguity invalidate TimestampValue.
             :   const cctz::time_zone::civil_lookup from_cl = local_tz->lookup(from_cs);
             :   if (UNLIKELY(from_cl.kind != cctz::time_zone::civil_lookup::UNIQUE)) {
             :     SetToInvalidDateTime();
             :   } else {
             :     cctz::time_point<cctz::sys_seconds> from_tp = from_cl.pre;
             : 
             :     // Convert 'from_tp' time_point to civil_second assuming 'UTC' time-zone.
             :     cctz::civil_second to_cs = cctz::convert(from_tp, TimezoneDatabase::GetUtcTimezone());
             : 
             :     // boost::gregorian::date() throws boost::gregorian::bad_year if year is not in the
             :     // 1400..9999 range. Need to check validity before creating the date object.
             :     if (UNLIKELY(CheckIfDateOutOfRange(cctz::civil_day(to_cs)))) {
> I may be possible to get TimestampValue from cctz::time_zone::civil_lookup 
Done


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc@146
PS4, Line 146:       time_ = boost::posix_time::time_duration(to_cs.hour(), to_cs.minute(), to_cs.second(),
> nit: long line
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 4
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 10 May 2018 15:28:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................

IMPALA-3307: Add support for IANA time-zone db

Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.
- Introduces a new startup flag (--hdfs_zone_info_dir) to impalad to
  specify an HDFS/S3/ADLS location that contains the shared compiled
  IANA time-zone database. If the startup flag is set, impalad will
  use the specified time-zone database. Otherwise, impalad will use
  the default /usr/share/zoneinfo time-zone database.
- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.
- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.
- Introduces a new startup flag (--hdfs_zone_abbrev_conf) to impalad
  to specify an HDFS/S3/ADLS path to a shared config file that
  contains definitions for non-standard time-zone abbreviations.

Cherry-picks: not for 2.x.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
A be/src/exprs/timezone_db-test.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/hdfs-util-test.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M be/src/util/time-test.cc
M be/src/util/time.cc
M be/src/util/time.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindCctz.cmake
M common/thrift/ImpalaInternalService.thrift
M common/thrift/metrics.json
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/bin/create-load-data.sh
M testdata/data/timezoneverification.csv
A testdata/tzdb/2017c/Africa/Abidjan
A testdata/tzdb/2017c/Africa/Accra
A testdata/tzdb/2017c/Africa/Addis_Ababa
A testdata/tzdb/2017c/Africa/Algiers
A testdata/tzdb/2017c/Africa/Asmara
A testdata/tzdb/2017c/Africa/Asmera
A testdata/tzdb/2017c/Africa/Bamako
A testdata/tzdb/2017c/Africa/Bangui
A testdata/tzdb/2017c/Africa/Banjul
A testdata/tzdb/2017c/Africa/Bissau
A testdata/tzdb/2017c/Africa/Blantyre
A testdata/tzdb/2017c/Africa/Brazzaville
A testdata/tzdb/2017c/Africa/Bujumbura
A testdata/tzdb/2017c/Africa/Cairo
A testdata/tzdb/2017c/Africa/Casablanca
A testdata/tzdb/2017c/Africa/Ceuta
A testdata/tzdb/2017c/Africa/Conakry
A testdata/tzdb/2017c/Africa/Dakar
A testdata/tzdb/2017c/Africa/Dar_es_Salaam
A testdata/tzdb/2017c/Africa/Djibouti
A testdata/tzdb/2017c/Africa/Douala
A testdata/tzdb/2017c/Africa/El_Aaiun
A testdata/tzdb/2017c/Africa/Freetown
A testdata/tzdb/2017c/Africa/Gaborone
A testdata/tzdb/2017c/Africa/Harare
A testdata/tzdb/2017c/Africa/Johannesburg
A testdata/tzdb/2017c/Africa/Juba
A testdata/tzdb/2017c/Africa/Kampala
A testdata/tzdb/2017c/Africa/Khartoum
A testdata/tzdb/2017c/Africa/Kigali
A testdata/tzdb/2017c/Africa/Kinshasa
A testdata/tzdb/2017c/Africa/Lagos
A testdata/tzdb/2017c/Africa/Libreville
A testdata/tzdb/2017c/Africa/Lome
A testdata/tzdb/2017c/Africa/Luanda
A testdata/tzdb/2017c/Africa/Lubumbashi
A testdata/tzdb/2017c/Africa/Lusaka
A testdata/tzdb/2017c/Africa/Malabo
A testdata/tzdb/2017c/Africa/Maputo
A testdata/tzdb/2017c/Africa/Maseru
A testdata/tzdb/2017c/Africa/Mbabane
A testdata/tzdb/2017c/Africa/Mogadishu
A testdata/tzdb/2017c/Africa/Monrovia
A testdata/tzdb/2017c/Africa/Nairobi
A testdata/tzdb/2017c/Africa/Ndjamena
A testdata/tzdb/2017c/Africa/Niamey
A testdata/tzdb/2017c/Africa/Nouakchott
A testdata/tzdb/2017c/Africa/Ouagadougou
A testdata/tzdb/2017c/Africa/Porto-Novo
A testdata/tzdb/2017c/Africa/Sao_Tome
A testdata/tzdb/2017c/Africa/Timbuktu
A testdata/tzdb/2017c/Africa/Tripoli
A testdata/tzdb/2017c/Africa/Tunis
A testdata/tzdb/2017c/Africa/Windhoek
A testdata/tzdb/2017c/America/Adak
A testdata/tzdb/2017c/America/Anchorage
A testdata/tzdb/2017c/America/Anguilla
A testdata/tzdb/2017c/America/Antigua
A testdata/tzdb/2017c/America/Araguaina
A testdata/tzdb/2017c/America/Argentina/Buenos_Aires
A testdata/tzdb/2017c/America/Argentina/Catamarca
A testdata/tzdb/2017c/America/Argentina/ComodRivadavia
A testdata/tzdb/2017c/America/Argentina/Cordoba
A testdata/tzdb/2017c/America/Argentina/Jujuy
A testdata/tzdb/2017c/America/Argentina/La_Rioja
A testdata/tzdb/2017c/America/Argentina/Mendoza
A testdata/tzdb/2017c/America/Argentina/Rio_Gallegos
A testdata/tzdb/2017c/America/Argentina/Salta
A testdata/tzdb/2017c/America/Argentina/San_Juan
A testdata/tzdb/2017c/America/Argentina/San_Luis
A testdata/tzdb/2017c/America/Argentina/Tucuman
A testdata/tzdb/2017c/America/Argentina/Ushuaia
A testdata/tzdb/2017c/America/Aruba
A testdata/tzdb/2017c/America/Asuncion
A testdata/tzdb/2017c/America/Atikokan
A testdata/tzdb/2017c/America/Atka
A testdata/tzdb/2017c/America/Bahia
A testdata/tzdb/2017c/America/Bahia_Banderas
A testdata/tzdb/2017c/America/Barbados
A testdata/tzdb/2017c/America/Belem
A testdata/tzdb/2017c/America/Belize
A testdata/tzdb/2017c/America/Blanc-Sablon
A testdata/tzdb/2017c/America/Boa_Vista
A testdata/tzdb/2017c/America/Bogota
A testdata/tzdb/2017c/America/Boise
A testdata/tzdb/2017c/America/Buenos_Aires
A testdata/tzdb/2017c/America/Cambridge_Bay
A testdata/tzdb/2017c/America/Campo_Grande
A testdata/tzdb/2017c/America/Cancun
A testdata/tzdb/2017c/America/Caracas
A testdata/tzdb/2017c/America/Catamarca
A testdata/tzdb/2017c/America/Cayenne
A testdata/tzdb/2017c/America/Cayman
A testdata/tzdb/2017c/America/Chicago
A testdata/tzdb/2017c/America/Chihuahua
A testdata/tzdb/2017c/America/Coral_Harbour
A testdata/tzdb/2017c/America/Cordoba
A testdata/tzdb/2017c/America/Costa_Rica
A testdata/tzdb/2017c/America/Creston
A testdata/tzdb/2017c/America/Cuiaba
A testdata/tzdb/2017c/America/Curacao
A testdata/tzdb/2017c/America/Danmarkshavn
A testdata/tzdb/2017c/America/Dawson
A testdata/tzdb/2017c/America/Dawson_Creek
A testdata/tzdb/2017c/America/Denver
A testdata/tzdb/2017c/America/Detroit
A testdata/tzdb/2017c/America/Dominica
A testdata/tzdb/2017c/America/Edmonton
A testdata/tzdb/2017c/America/Eirunepe
A testdata/tzdb/2017c/America/El_Salvador
A testdata/tzdb/2017c/America/Ensenada
A testdata/tzdb/2017c/America/Fort_Nelson
A testdata/tzdb/2017c/America/Fort_Wayne
A testdata/tzdb/2017c/America/Fortaleza
A testdata/tzdb/2017c/America/Glace_Bay
A testdata/tzdb/2017c/America/Godthab
A testdata/tzdb/2017c/America/Goose_Bay
A testdata/tzdb/2017c/America/Grand_Turk
A testdata/tzdb/2017c/America/Grenada
A testdata/tzdb/2017c/America/Guadeloupe
A testdata/tzdb/2017c/America/Guatemala
A testdata/tzdb/2017c/America/Guayaquil
A testdata/tzdb/2017c/America/Guyana
A testdata/tzdb/2017c/America/Halifax
A testdata/tzdb/2017c/America/Havana
A testdata/tzdb/2017c/America/Hermosillo
A testdata/tzdb/2017c/America/Indiana/Indianapolis
A testdata/tzdb/2017c/America/Indiana/Knox
A testdata/tzdb/2017c/America/Indiana/Marengo
A testdata/tzdb/2017c/America/Indiana/Petersburg
A testdata/tzdb/2017c/America/Indiana/Tell_City
A testdata/tzdb/2017c/America/Indiana/Vevay
A testdata/tzdb/2017c/America/Indiana/Vincennes
A testdata/tzdb/2017c/America/Indiana/Winamac
A testdata/tzdb/2017c/America/Indianapolis
A testdata/tzdb/2017c/America/Inuvik
A testdata/tzdb/2017c/America/Iqaluit
A testdata/tzdb/2017c/America/Jamaica
A testdata/tzdb/2017c/America/Jujuy
A testdata/tzdb/2017c/America/Juneau
A testdata/tzdb/2017c/America/Kentucky/Louisville
A testdata/tzdb/2017c/America/Kentucky/Monticello
A testdata/tzdb/2017c/America/Knox_IN
A testdata/tzdb/2017c/America/Kralendijk
A testdata/tzdb/2017c/America/La_Paz
A testdata/tzdb/2017c/America/Lima
A testdata/tzdb/2017c/America/Los_Angeles
A testdata/tzdb/2017c/America/Louisville
A testdata/tzdb/2017c/America/Lower_Princes
A testdata/tzdb/2017c/America/Maceio
A testdata/tzdb/2017c/America/Managua
A testdata/tzdb/2017c/America/Manaus
A testdata/tzdb/2017c/America/Marigot
A testdata/tzdb/2017c/America/Martinique
A testdata/tzdb/2017c/America/Matamoros
A testdata/tzdb/2017c/America/Mazatlan
A testdata/tzdb/2017c/America/Mendoza
A testdata/tzdb/2017c/America/Menominee
A testdata/tzdb/2017c/America/Merida
A testdata/tzdb/2017c/America/Metlakatla
A testdata/tzdb/2017c/America/Mexico_City
A testdata/tzdb/2017c/America/Miquelon
A testdata/tzdb/2017c/America/Moncton
A testdata/tzdb/2017c/America/Monterrey
A testdata/tzdb/2017c/America/Montevideo
A testdata/tzdb/2017c/America/Montreal
A testdata/tzdb/2017c/America/Montserrat
A testdata/tzdb/2017c/America/Nassau
A testdata/tzdb/2017c/America/New_York
A testdata/tzdb/2017c/America/Nipigon
A testdata/tzdb/2017c/America/Nome
A testdata/tzdb/2017c/America/Noronha
A testdata/tzdb/2017c/America/North_Dakota/Beulah
A testdata/tzdb/2017c/America/North_Dakota/Center
A testdata/tzdb/2017c/America/North_Dakota/New_Salem
A testdata/tzdb/2017c/America/Ojinaga
A testdata/tzdb/2017c/America/Panama
A testdata/tzdb/2017c/America/Pangnirtung
A testdata/tzdb/2017c/America/Paramaribo
A testdata/tzdb/2017c/America/Phoenix
A testdata/tzdb/2017c/America/Port-au-Prince
A testdata/tzdb/2017c/America/Port_of_Spain
A testdata/tzdb/2017c/America/Porto_Acre
A testdata/tzdb/2017c/America/Porto_Velho
A testdata/tzdb/2017c/America/Puerto_Rico
A testdata/tzdb/2017c/America/Punta_Arenas
A testdata/tzdb/2017c/America/Rainy_River
A testdata/tzdb/2017c/America/Rankin_Inlet
A testdata/tzdb/2017c/America/Recife
A testdata/tzdb/2017c/America/Regina
A testdata/tzdb/2017c/America/Resolute
A testdata/tzdb/2017c/America/Rio_Branco
A testdata/tzdb/2017c/America/Rosario
A testdata/tzdb/2017c/America/Santa_Isabel
A testdata/tzdb/2017c/America/Santarem
A testdata/tzdb/2017c/America/Santiago
A testdata/tzdb/2017c/America/Santo_Domingo
A testdata/tzdb/2017c/America/Sao_Paulo
A testdata/tzdb/2017c/America/Scoresbysund
A testdata/tzdb/2017c/America/Shiprock
A testdata/tzdb/2017c/America/Sitka
A testdata/tzdb/2017c/America/St_Barthelemy
A testdata/tzdb/2017c/America/St_Johns
A testdata/tzdb/2017c/America/St_Kitts
A testdata/tzdb/2017c/America/St_Lucia
A testdata/tzdb/2017c/America/St_Thomas
A testdata/tzdb/2017c/America/St_Vincent
A testdata/tzdb/2017c/America/Swift_Current
A testdata/tzdb/2017c/America/Tegucigalpa
A testdata/tzdb/2017c/America/Thule
A testdata/tzdb/2017c/America/Thunder_Bay
A testdata/tzdb/2017c/America/Tijuana
A testdata/tzdb/2017c/America/Toronto
A testdata/tzdb/2017c/America/Tortola
A testdata/tzdb/2017c/America/Vancouver
A testdata/tzdb/2017c/America/Virgin
A testdata/tzdb/2017c/America/Whitehorse
A testdata/tzdb/2017c/America/Winnipeg
A testdata/tzdb/2017c/America/Yakutat
A testdata/tzdb/2017c/America/Yellowknife
A testdata/tzdb/2017c/Antarctica/Casey
A testdata/tzdb/2017c/Antarctica/Davis
A testdata/tzdb/2017c/Antarctica/DumontDUrville
A testdata/tzdb/2017c/Antarctica/Macquarie
A testdata/tzdb/2017c/Antarctica/Mawson
A testdata/tzdb/2017c/Antarctica/McMurdo
A testdata/tzdb/2017c/Antarctica/Palmer
A testdata/tzdb/2017c/Antarctica/Rothera
A testdata/tzdb/2017c/Antarctica/South_Pole
A testdata/tzdb/2017c/Antarctica/Syowa
A testdata/tzdb/2017c/Antarctica/Troll
A testdata/tzdb/2017c/Antarctica/Vostok
A testdata/tzdb/2017c/Arctic/Longyearbyen
A testdata/tzdb/2017c/Asia/Aden
A testdata/tzdb/2017c/Asia/Almaty
A testdata/tzdb/2017c/Asia/Amman
A testdata/tzdb/2017c/Asia/Anadyr
A testdata/tzdb/2017c/Asia/Aqtau
A testdata/tzdb/2017c/Asia/Aqtobe
A testdata/tzdb/2017c/Asia/Ashgabat
A testdata/tzdb/2017c/Asia/Ashkhabad
A testdata/tzdb/2017c/Asia/Atyrau
A testdata/tzdb/2017c/Asia/Baghdad
A testdata/tzdb/2017c/Asia/Bahrain
A testdata/tzdb/2017c/Asia/Baku
A testdata/tzdb/2017c/Asia/Bangkok
A testdata/tzdb/2017c/Asia/Barnaul
A testdata/tzdb/2017c/Asia/Beirut
A testdata/tzdb/2017c/Asia/Bishkek
A testdata/tzdb/2017c/Asia/Brunei
A testdata/tzdb/2017c/Asia/Calcutta
A testdata/tzdb/2017c/Asia/Chita
A testdata/tzdb/2017c/Asia/Choibalsan
A testdata/tzdb/2017c/Asia/Chongqing
A testdata/tzdb/2017c/Asia/Chungking
A testdata/tzdb/2017c/Asia/Colombo
A testdata/tzdb/2017c/Asia/Dacca
A testdata/tzdb/2017c/Asia/Damascus
A testdata/tzdb/2017c/Asia/Dhaka
A testdata/tzdb/2017c/Asia/Dili
A testdata/tzdb/2017c/Asia/Dubai
A testdata/tzdb/2017c/Asia/Dushanbe
A testdata/tzdb/2017c/Asia/Famagusta
A testdata/tzdb/2017c/Asia/Gaza
A testdata/tzdb/2017c/Asia/Harbin
A testdata/tzdb/2017c/Asia/Hebron
A testdata/tzdb/2017c/Asia/Ho_Chi_Minh
A testdata/tzdb/2017c/Asia/Hong_Kong
A testdata/tzdb/2017c/Asia/Hovd
A testdata/tzdb/2017c/Asia/Irkutsk
A testdata/tzdb/2017c/Asia/Istanbul
A testdata/tzdb/2017c/Asia/Jakarta
A testdata/tzdb/2017c/Asia/Jayapura
A testdata/tzdb/2017c/Asia/Jerusalem
A testdata/tzdb/2017c/Asia/Kabul
A testdata/tzdb/2017c/Asia/Kamchatka
A testdata/tzdb/2017c/Asia/Karachi
A testdata/tzdb/2017c/Asia/Kashgar
A testdata/tzdb/2017c/Asia/Kathmandu
A testdata/tzdb/2017c/Asia/Katmandu
A testdata/tzdb/2017c/Asia/Khandyga
A testdata/tzdb/2017c/Asia/Kolkata
A testdata/tzdb/2017c/Asia/Krasnoyarsk
A testdata/tzdb/2017c/Asia/Kuala_Lumpur
A testdata/tzdb/2017c/Asia/Kuching
A testdata/tzdb/2017c/Asia/Kuwait
A testdata/tzdb/2017c/Asia/Macao
A testdata/tzdb/2017c/Asia/Macau
A testdata/tzdb/2017c/Asia/Magadan
A testdata/tzdb/2017c/Asia/Makassar
A testdata/tzdb/2017c/Asia/Manila
A testdata/tzdb/2017c/Asia/Muscat
A testdata/tzdb/2017c/Asia/Nicosia
A testdata/tzdb/2017c/Asia/Novokuznetsk
A testdata/tzdb/2017c/Asia/Novosibirsk
A testdata/tzdb/2017c/Asia/Omsk
A testdata/tzdb/2017c/Asia/Oral
A testdata/tzdb/2017c/Asia/Phnom_Penh
A testdata/tzdb/2017c/Asia/Pontianak
A testdata/tzdb/2017c/Asia/Pyongyang
A testdata/tzdb/2017c/Asia/Qatar
A testdata/tzdb/2017c/Asia/Qyzylorda
A testdata/tzdb/2017c/Asia/Rangoon
A testdata/tzdb/2017c/Asia/Riyadh
A testdata/tzdb/2017c/Asia/Saigon
A testdata/tzdb/2017c/Asia/Sakhalin
A testdata/tzdb/2017c/Asia/Samarkand
A testdata/tzdb/2017c/Asia/Seoul
A testdata/tzdb/2017c/Asia/Shanghai
A testdata/tzdb/2017c/Asia/Singapore
A testdata/tzdb/2017c/Asia/Srednekolymsk
A testdata/tzdb/2017c/Asia/Taipei
A testdata/tzdb/2017c/Asia/Tashkent
A testdata/tzdb/2017c/Asia/Tbilisi
A testdata/tzdb/2017c/Asia/Tehran
A testdata/tzdb/2017c/Asia/Tel_Aviv
A testdata/tzdb/2017c/Asia/Thimbu
A testdata/tzdb/2017c/Asia/Thimphu
A testdata/tzdb/2017c/Asia/Tokyo
A testdata/tzdb/2017c/Asia/Tomsk
A testdata/tzdb/2017c/Asia/Ujung_Pandang
A testdata/tzdb/2017c/Asia/Ulaanbaatar
A testdata/tzdb/2017c/Asia/Ulan_Bator
A testdata/tzdb/2017c/Asia/Urumqi
A testdata/tzdb/2017c/Asia/Ust-Nera
A testdata/tzdb/2017c/Asia/Vientiane
A testdata/tzdb/2017c/Asia/Vladivostok
A testdata/tzdb/2017c/Asia/Yakutsk
A testdata/tzdb/2017c/Asia/Yangon
A testdata/tzdb/2017c/Asia/Yekaterinburg
A testdata/tzdb/2017c/Asia/Yerevan
A testdata/tzdb/2017c/Atlantic/Azores
A testdata/tzdb/2017c/Atlantic/Bermuda
A testdata/tzdb/2017c/Atlantic/Canary
A testdata/tzdb/2017c/Atlantic/Cape_Verde
A testdata/tzdb/2017c/Atlantic/Faeroe
A testdata/tzdb/2017c/Atlantic/Faroe
A testdata/tzdb/2017c/Atlantic/Jan_Mayen
A testdata/tzdb/2017c/Atlantic/Madeira
A testdata/tzdb/2017c/Atlantic/Reykjavik
A testdata/tzdb/2017c/Atlantic/South_Georgia
A testdata/tzdb/2017c/Atlantic/St_Helena
A testdata/tzdb/2017c/Atlantic/Stanley
A testdata/tzdb/2017c/Australia/ACT
A testdata/tzdb/2017c/Australia/Adelaide
A testdata/tzdb/2017c/Australia/Brisbane
A testdata/tzdb/2017c/Australia/Broken_Hill
A testdata/tzdb/2017c/Australia/Canberra
A testdata/tzdb/2017c/Australia/Currie
A testdata/tzdb/2017c/Australia/Darwin
A testdata/tzdb/2017c/Australia/Eucla
A testdata/tzdb/2017c/Australia/Hobart
A testdata/tzdb/2017c/Australia/LHI
A testdata/tzdb/2017c/Australia/Lindeman
A testdata/tzdb/2017c/Australia/Lord_Howe
A testdata/tzdb/2017c/Australia/Melbourne
A testdata/tzdb/2017c/Australia/NSW
A testdata/tzdb/2017c/Australia/North
A testdata/tzdb/2017c/Australia/Perth
A testdata/tzdb/2017c/Australia/Queensland
A testdata/tzdb/2017c/Australia/South
A testdata/tzdb/2017c/Australia/Sydney
A testdata/tzdb/2017c/Australia/Tasmania
A testdata/tzdb/2017c/Australia/Victoria
A testdata/tzdb/2017c/Australia/West
A testdata/tzdb/2017c/Australia/Yancowinna
A testdata/tzdb/2017c/Brazil/Acre
A testdata/tzdb/2017c/Brazil/DeNoronha
A testdata/tzdb/2017c/Brazil/East
A testdata/tzdb/2017c/Brazil/West
A testdata/tzdb/2017c/CET
A testdata/tzdb/2017c/CST6CDT
A testdata/tzdb/2017c/Canada/Atlantic
A testdata/tzdb/2017c/Canada/Central
A testdata/tzdb/2017c/Canada/Eastern
A testdata/tzdb/2017c/Canada/Mountain
A testdata/tzdb/2017c/Canada/Newfoundland
A testdata/tzdb/2017c/Canada/Pacific
A testdata/tzdb/2017c/Canada/Saskatchewan
A testdata/tzdb/2017c/Canada/Yukon
A testdata/tzdb/2017c/Chile/Continental
A testdata/tzdb/2017c/Chile/EasterIsland
A testdata/tzdb/2017c/Cuba
A testdata/tzdb/2017c/EET
A testdata/tzdb/2017c/EST
A testdata/tzdb/2017c/EST5EDT
A testdata/tzdb/2017c/Egypt
A testdata/tzdb/2017c/Eire
A testdata/tzdb/2017c/Etc/GMT
A testdata/tzdb/2017c/Etc/GMT+0
A testdata/tzdb/2017c/Etc/GMT+1
A testdata/tzdb/2017c/Etc/GMT+10
A testdata/tzdb/2017c/Etc/GMT+11
A testdata/tzdb/2017c/Etc/GMT+12
A testdata/tzdb/2017c/Etc/GMT+2
A testdata/tzdb/2017c/Etc/GMT+3
A testdata/tzdb/2017c/Etc/GMT+4
A testdata/tzdb/2017c/Etc/GMT+5
A testdata/tzdb/2017c/Etc/GMT+6
A testdata/tzdb/2017c/Etc/GMT+7
A testdata/tzdb/2017c/Etc/GMT+8
A testdata/tzdb/2017c/Etc/GMT+9
A testdata/tzdb/2017c/Etc/GMT-0
A testdata/tzdb/2017c/Etc/GMT-1
A testdata/tzdb/2017c/Etc/GMT-10
A testdata/tzdb/2017c/Etc/GMT-11
A testdata/tzdb/2017c/Etc/GMT-12
A testdata/tzdb/2017c/Etc/GMT-13
A testdata/tzdb/2017c/Etc/GMT-14
A testdata/tzdb/2017c/Etc/GMT-2
A testdata/tzdb/2017c/Etc/GMT-3
A testdata/tzdb/2017c/Etc/GMT-4
A testdata/tzdb/2017c/Etc/GMT-5
A testdata/tzdb/2017c/Etc/GMT-6
A testdata/tzdb/2017c/Etc/GMT-7
A testdata/tzdb/2017c/Etc/GMT-8
A testdata/tzdb/2017c/Etc/GMT-9
A testdata/tzdb/2017c/Etc/GMT0
A testdata/tzdb/2017c/Etc/Greenwich
A testdata/tzdb/2017c/Etc/UCT
A testdata/tzdb/2017c/Etc/UTC
A testdata/tzdb/2017c/Etc/Universal
A testdata/tzdb/2017c/Etc/Zulu
A testdata/tzdb/2017c/Europe/Amsterdam
A testdata/tzdb/2017c/Europe/Andorra
A testdata/tzdb/2017c/Europe/Astrakhan
A testdata/tzdb/2017c/Europe/Athens
A testdata/tzdb/2017c/Europe/Belfast
A testdata/tzdb/2017c/Europe/Belgrade
A testdata/tzdb/2017c/Europe/Berlin
A testdata/tzdb/2017c/Europe/Bratislava
A testdata/tzdb/2017c/Europe/Brussels
A testdata/tzdb/2017c/Europe/Bucharest
A testdata/tzdb/2017c/Europe/Budapest
A testdata/tzdb/2017c/Europe/Busingen
A testdata/tzdb/2017c/Europe/Chisinau
A testdata/tzdb/2017c/Europe/Copenhagen
A testdata/tzdb/2017c/Europe/Dublin
A testdata/tzdb/2017c/Europe/Gibraltar
A testdata/tzdb/2017c/Europe/Guernsey
A testdata/tzdb/2017c/Europe/Helsinki
A testdata/tzdb/2017c/Europe/Isle_of_Man
A testdata/tzdb/2017c/Europe/Istanbul
A testdata/tzdb/2017c/Europe/Jersey
A testdata/tzdb/2017c/Europe/Kaliningrad
A testdata/tzdb/2017c/Europe/Kiev
A testdata/tzdb/2017c/Europe/Kirov
A testdata/tzdb/2017c/Europe/Lisbon
A testdata/tzdb/2017c/Europe/Ljubljana
A testdata/tzdb/2017c/Europe/London
A testdata/tzdb/2017c/Europe/Luxembourg
A testdata/tzdb/2017c/Europe/Madrid
A testdata/tzdb/2017c/Europe/Malta
A testdata/tzdb/2017c/Europe/Mariehamn
A testdata/tzdb/2017c/Europe/Minsk
A testdata/tzdb/2017c/Europe/Monaco
A testdata/tzdb/2017c/Europe/Moscow
A testdata/tzdb/2017c/Europe/Nicosia
A testdata/tzdb/2017c/Europe/Oslo
A testdata/tzdb/2017c/Europe/Paris
A testdata/tzdb/2017c/Europe/Podgorica
A testdata/tzdb/2017c/Europe/Prague
A testdata/tzdb/2017c/Europe/Riga
A testdata/tzdb/2017c/Europe/Rome
A testdata/tzdb/2017c/Europe/Samara
A testdata/tzdb/2017c/Europe/San_Marino
A testdata/tzdb/2017c/Europe/Sarajevo
A testdata/tzdb/2017c/Europe/Saratov
A testdata/tzdb/2017c/Europe/Simferopol
A testdata/tzdb/2017c/Europe/Skopje
A testdata/tzdb/2017c/Europe/Sofia
A testdata/tzdb/2017c/Europe/Stockholm
A testdata/tzdb/2017c/Europe/Tallinn
A testdata/tzdb/2017c/Europe/Tirane
A testdata/tzdb/2017c/Europe/Tiraspol
A testdata/tzdb/2017c/Europe/Ulyanovsk
A testdata/tzdb/2017c/Europe/Uzhgorod
A testdata/tzdb/2017c/Europe/Vaduz
A testdata/tzdb/2017c/Europe/Vatican
A testdata/tzdb/2017c/Europe/Vienna
A testdata/tzdb/2017c/Europe/Vilnius
A testdata/tzdb/2017c/Europe/Volgograd
A testdata/tzdb/2017c/Europe/Warsaw
A testdata/tzdb/2017c/Europe/Zagreb
A testdata/tzdb/2017c/Europe/Zaporozhye
A testdata/tzdb/2017c/Europe/Zurich
A testdata/tzdb/2017c/Factory
A testdata/tzdb/2017c/GB
A testdata/tzdb/2017c/GB-Eire
A testdata/tzdb/2017c/GMT
A testdata/tzdb/2017c/GMT+0
A testdata/tzdb/2017c/GMT-0
A testdata/tzdb/2017c/GMT0
A testdata/tzdb/2017c/Greenwich
A testdata/tzdb/2017c/HST
A testdata/tzdb/2017c/Hongkong
A testdata/tzdb/2017c/Iceland
A testdata/tzdb/2017c/Indian/Antananarivo
A testdata/tzdb/2017c/Indian/Chagos
A testdata/tzdb/2017c/Indian/Christmas
A testdata/tzdb/2017c/Indian/Cocos
A testdata/tzdb/2017c/Indian/Comoro
A testdata/tzdb/2017c/Indian/Kerguelen
A testdata/tzdb/2017c/Indian/Mahe
A testdata/tzdb/2017c/Indian/Maldives
A testdata/tzdb/2017c/Indian/Mauritius
A testdata/tzdb/2017c/Indian/Mayotte
A testdata/tzdb/2017c/Indian/Reunion
A testdata/tzdb/2017c/Iran
A testdata/tzdb/2017c/Israel
A testdata/tzdb/2017c/Jamaica
A testdata/tzdb/2017c/Japan
A testdata/tzdb/2017c/Kwajalein
A testdata/tzdb/2017c/Libya
A testdata/tzdb/2017c/MET
A testdata/tzdb/2017c/MST
A testdata/tzdb/2017c/MST7MDT
A testdata/tzdb/2017c/Mexico/BajaNorte
A testdata/tzdb/2017c/Mexico/BajaSur
A testdata/tzdb/2017c/Mexico/General
A testdata/tzdb/2017c/NZ
A testdata/tzdb/2017c/NZ-CHAT
A testdata/tzdb/2017c/Navajo
A testdata/tzdb/2017c/PRC
A testdata/tzdb/2017c/PST8PDT
A testdata/tzdb/2017c/Pacific/Apia
A testdata/tzdb/2017c/Pacific/Auckland
A testdata/tzdb/2017c/Pacific/Bougainville
A testdata/tzdb/2017c/Pacific/Chatham
A testdata/tzdb/2017c/Pacific/Chuuk
A testdata/tzdb/2017c/Pacific/Easter
A testdata/tzdb/2017c/Pacific/Efate
A testdata/tzdb/2017c/Pacific/Enderbury
A testdata/tzdb/2017c/Pacific/Fakaofo
A testdata/tzdb/2017c/Pacific/Fiji
A testdata/tzdb/2017c/Pacific/Funafuti
A testdata/tzdb/2017c/Pacific/Galapagos
A testdata/tzdb/2017c/Pacific/Gambier
A testdata/tzdb/2017c/Pacific/Guadalcanal
A testdata/tzdb/2017c/Pacific/Guam
A testdata/tzdb/2017c/Pacific/Honolulu
A testdata/tzdb/2017c/Pacific/Johnston
A testdata/tzdb/2017c/Pacific/Kiritimati
A testdata/tzdb/2017c/Pacific/Kosrae
A testdata/tzdb/2017c/Pacific/Kwajalein
A testdata/tzdb/2017c/Pacific/Majuro
A testdata/tzdb/2017c/Pacific/Marquesas
A testdata/tzdb/2017c/Pacific/Midway
A testdata/tzdb/2017c/Pacific/Nauru
A testdata/tzdb/2017c/Pacific/Niue
A testdata/tzdb/2017c/Pacific/Norfolk
A testdata/tzdb/2017c/Pacific/Noumea
A testdata/tzdb/2017c/Pacific/Pago_Pago
A testdata/tzdb/2017c/Pacific/Palau
A testdata/tzdb/2017c/Pacific/Pitcairn
A testdata/tzdb/2017c/Pacific/Pohnpei
A testdata/tzdb/2017c/Pacific/Ponape
A testdata/tzdb/2017c/Pacific/Port_Moresby
A testdata/tzdb/2017c/Pacific/Rarotonga
A testdata/tzdb/2017c/Pacific/Saipan
A testdata/tzdb/2017c/Pacific/Samoa
A testdata/tzdb/2017c/Pacific/Tahiti
A testdata/tzdb/2017c/Pacific/Tarawa
A testdata/tzdb/2017c/Pacific/Tongatapu
A testdata/tzdb/2017c/Pacific/Truk
A testdata/tzdb/2017c/Pacific/Wake
A testdata/tzdb/2017c/Pacific/Wallis
A testdata/tzdb/2017c/Pacific/Yap
A testdata/tzdb/2017c/Poland
A testdata/tzdb/2017c/Portugal
A testdata/tzdb/2017c/ROC
A testdata/tzdb/2017c/ROK
A testdata/tzdb/2017c/Singapore
A testdata/tzdb/2017c/SystemV/AST4
A testdata/tzdb/2017c/SystemV/AST4ADT
A testdata/tzdb/2017c/SystemV/CST6
A testdata/tzdb/2017c/SystemV/CST6CDT
A testdata/tzdb/2017c/SystemV/EST5
A testdata/tzdb/2017c/SystemV/EST5EDT
A testdata/tzdb/2017c/SystemV/HST10
A testdata/tzdb/2017c/SystemV/MST7
A testdata/tzdb/2017c/SystemV/MST7MDT
A testdata/tzdb/2017c/SystemV/PST8
A testdata/tzdb/2017c/SystemV/PST8PDT
A testdata/tzdb/2017c/SystemV/YST9
A testdata/tzdb/2017c/SystemV/YST9YDT
A testdata/tzdb/2017c/Turkey
A testdata/tzdb/2017c/UCT
A testdata/tzdb/2017c/US/Alaska
A testdata/tzdb/2017c/US/Aleutian
A testdata/tzdb/2017c/US/Arizona
A testdata/tzdb/2017c/US/Central
A testdata/tzdb/2017c/US/East-Indiana
A testdata/tzdb/2017c/US/Eastern
A testdata/tzdb/2017c/US/Hawaii
A testdata/tzdb/2017c/US/Indiana-Starke
A testdata/tzdb/2017c/US/Michigan
A testdata/tzdb/2017c/US/Mountain
A testdata/tzdb/2017c/US/Pacific
A testdata/tzdb/2017c/US/Pacific-New
A testdata/tzdb/2017c/US/Samoa
A testdata/tzdb/2017c/UTC
A testdata/tzdb/2017c/Universal
A testdata/tzdb/2017c/W-SU
A testdata/tzdb/2017c/WET
A testdata/tzdb/2017c/Zulu
A testdata/tzdb/2017c/iso3166.tab
A testdata/tzdb/2017c/leap-seconds.list
A testdata/tzdb/2017c/posix/Africa/Abidjan
A testdata/tzdb/2017c/posix/Africa/Accra
A testdata/tzdb/2017c/posix/Africa/Addis_Ababa
A testdata/tzdb/2017c/posix/Africa/Algiers
A testdata/tzdb/2017c/posix/Africa/Asmara
A testdata/tzdb/2017c/posix/Africa/Asmera
A testdata/tzdb/2017c/posix/Africa/Bamako
A testdata/tzdb/2017c/posix/Africa/Bangui
A testdata/tzdb/2017c/posix/Africa/Banjul
A testdata/tzdb/2017c/posix/Africa/Bissau
A testdata/tzdb/2017c/posix/Africa/Blantyre
A testdata/tzdb/2017c/posix/Africa/Brazzaville
A testdata/tzdb/2017c/posix/Africa/Bujumbura
A testdata/tzdb/2017c/posix/Africa/Cairo
A testdata/tzdb/2017c/posix/Africa/Casablanca
A testdata/tzdb/2017c/posix/Africa/Ceuta
A testdata/tzdb/2017c/posix/Africa/Conakry
A testdata/tzdb/2017c/posix/Africa/Dakar
A testdata/tzdb/2017c/posix/Africa/Dar_es_Salaam
A testdata/tzdb/2017c/posix/Africa/Djibouti
A testdata/tzdb/2017c/posix/Africa/Douala
A testdata/tzdb/2017c/posix/Africa/El_Aaiun
A testdata/tzdb/2017c/posix/Africa/Freetown
A testdata/tzdb/2017c/posix/Africa/Gaborone
A testdata/tzdb/2017c/posix/Africa/Harare
A testdata/tzdb/2017c/posix/Africa/Johannesburg
A testdata/tzdb/2017c/posix/Africa/Juba
A testdata/tzdb/2017c/posix/Africa/Kampala
A testdata/tzdb/2017c/posix/Africa/Khartoum
A testdata/tzdb/2017c/posix/Africa/Kigali
A testdata/tzdb/2017c/posix/Africa/Kinshasa
A testdata/tzdb/2017c/posix/Africa/Lagos
A testdata/tzdb/2017c/posix/Africa/Libreville
A testdata/tzdb/2017c/posix/Africa/Lome
A testdata/tzdb/2017c/posix/Africa/Luanda
A testdata/tzdb/2017c/posix/Africa/Lubumbashi
A testdata/tzdb/2017c/posix/Africa/Lusaka
A testdata/tzdb/2017c/posix/Africa/Malabo
A testdata/tzdb/2017c/posix/Africa/Maputo
A testdata/tzdb/2017c/posix/Africa/Maseru
A testdata/tzdb/2017c/posix/Africa/Mbabane
A testdata/tzdb/2017c/posix/Africa/Mogadishu
A testdata/tzdb/2017c/posix/Africa/Monrovia
A testdata/tzdb/2017c/posix/Africa/Nairobi
A testdata/tzdb/2017c/posix/Africa/Ndjamena
A testdata/tzdb/2017c/posix/Africa/Niamey
A testdata/tzdb/2017c/posix/Africa/Nouakchott
A testdata/tzdb/2017c/posix/Africa/Ouagadougou
A testdata/tzdb/2017c/posix/Africa/Porto-Novo
A testdata/tzdb/2017c/posix/Africa/Sao_Tome
A testdata/tzdb/2017c/posix/Africa/Timbuktu
A testdata/tzdb/2017c/posix/Africa/Tripoli
A testdata/tzdb/2017c/posix/Africa/Tunis
A testdata/tzdb/2017c/posix/Africa/Windhoek
A testdata/tzdb/2017c/posix/America/Adak
A testdata/tzdb/2017c/posix/America/Anchorage
A testdata/tzdb/2017c/posix/America/Anguilla
A testdata/tzdb/2017c/posix/America/Antigua
A testdata/tzdb/2017c/posix/America/Araguaina
A testdata/tzdb/2017c/posix/America/Argentina/Buenos_Aires
A testdata/tzdb/2017c/posix/America/Argentina/Catamarca
A testdata/tzdb/2017c/posix/America/Argentina/ComodRivadavia
A testdata/tzdb/2017c/posix/America/Argentina/Cordoba
A testdata/tzdb/2017c/posix/America/Argentina/Jujuy
A testdata/tzdb/2017c/posix/America/Argentina/La_Rioja
A testdata/tzdb/2017c/posix/America/Argentina/Mendoza
A testdata/tzdb/2017c/posix/America/Argentina/Rio_Gallegos
A testdata/tzdb/2017c/posix/America/Argentina/Salta
A testdata/tzdb/2017c/posix/America/Argentina/San_Juan
A testdata/tzdb/2017c/posix/America/Argentina/San_Luis
A testdata/tzdb/2017c/posix/America/Argentina/Tucuman
A testdata/tzdb/2017c/posix/America/Argentina/Ushuaia
A testdata/tzdb/2017c/posix/America/Aruba
A testdata/tzdb/2017c/posix/America/Asuncion
A testdata/tzdb/2017c/posix/America/Atikokan
A testdata/tzdb/2017c/posix/America/Atka
A testdata/tzdb/2017c/posix/America/Bahia
A testdata/tzdb/2017c/posix/America/Bahia_Banderas
A testdata/tzdb/2017c/posix/America/Barbados
A testdata/tzdb/2017c/posix/America/Belem
A testdata/tzdb/2017c/posix/America/Belize
A testdata/tzdb/2017c/posix/America/Blanc-Sablon
A testdata/tzdb/2017c/posix/America/Boa_Vista
A testdata/tzdb/2017c/posix/America/Bogota
A testdata/tzdb/2017c/posix/America/Boise
A testdata/tzdb/2017c/posix/America/Buenos_Aires
A testdata/tzdb/2017c/posix/America/Cambridge_Bay
A testdata/tzdb/2017c/posix/America/Campo_Grande
A testdata/tzdb/2017c/posix/America/Cancun
A testdata/tzdb/2017c/posix/America/Caracas
A testdata/tzdb/2017c/posix/America/Catamarca
A testdata/tzdb/2017c/posix/America/Cayenne
A testdata/tzdb/2017c/posix/America/Cayman
A testdata/tzdb/2017c/posix/America/Chicago
A testdata/tzdb/2017c/posix/America/Chihuahua
A testdata/tzdb/2017c/posix/America/Coral_Harbour
A testdata/tzdb/2017c/posix/America/Cordoba
A testdata/tzdb/2017c/posix/America/Costa_Rica
A testdata/tzdb/2017c/posix/America/Creston
A testdata/tzdb/2017c/posix/America/Cuiaba
A testdata/tzdb/2017c/posix/America/Curacao
A testdata/tzdb/2017c/posix/America/Danmarkshavn
A testdata/tzdb/2017c/posix/America/Dawson
A testdata/tzdb/2017c/posix/America/Dawson_Creek
A testdata/tzdb/2017c/posix/America/Denver
A testdata/tzdb/2017c/posix/America/Detroit
A testdata/tzdb/2017c/posix/America/Dominica
A testdata/tzdb/2017c/posix/America/Edmonton
A testdata/tzdb/2017c/posix/America/Eirunepe
A testdata/tzdb/2017c/posix/America/El_Salvador
A testdata/tzdb/2017c/posix/America/Ensenada
A testdata/tzdb/2017c/posix/America/Fort_Nelson
A testdata/tzdb/2017c/posix/America/Fort_Wayne
A testdata/tzdb/2017c/posix/America/Fortaleza
A testdata/tzdb/2017c/posix/America/Glace_Bay
A testdata/tzdb/2017c/posix/America/Godthab
A testdata/tzdb/2017c/posix/America/Goose_Bay
A testdata/tzdb/2017c/posix/America/Grand_Turk
A testdata/tzdb/2017c/posix/America/Grenada
A testdata/tzdb/2017c/posix/America/Guadeloupe
A testdata/tzdb/2017c/posix/America/Guatemala
A testdata/tzdb/2017c/posix/America/Guayaquil
A testdata/tzdb/2017c/posix/America/Guyana
A testdata/tzdb/2017c/posix/America/Halifax
A testdata/tzdb/2017c/posix/America/Havana
A testdata/tzdb/2017c/posix/America/Hermosillo
A testdata/tzdb/2017c/posix/America/Indiana/Indianapolis
A testdata/tzdb/2017c/posix/America/Indiana/Knox
A testdata/tzdb/2017c/posix/America/Indiana/Marengo
A testdata/tzdb/2017c/posix/America/Indiana/Petersburg
A testdata/tzdb/2017c/posix/America/Indiana/Tell_City
A testdata/tzdb/2017c/posix/America/Indiana/Vevay
A testdata/tzdb/2017c/posix/America/Indiana/Vincennes
A testdata/tzdb/2017c/posix/America/Indiana/Winamac
A testdata/tzdb/2017c/posix/America/Indianapolis
A testdata/tzdb/2017c/posix/America/Inuvik
A testdata/tzdb/2017c/posix/America/Iqaluit
A testdata/tzdb/2017c/posix/America/Jamaica
A testdata/tzdb/2017c/posix/America/Jujuy
A testdata/tzdb/2017c/posix/America/Juneau
A testdata/tzdb/2017c/posix/America/Kentucky/Louisville
A testdata/tzdb/2017c/posix/America/Kentucky/Monticello
A testdata/tzdb/2017c/posix/America/Knox_IN
A testdata/tzdb/2017c/posix/America/Kralendijk
A testdata/tzdb/2017c/posix/America/La_Paz
A testdata/tzdb/2017c/posix/America/Lima
A testdata/tzdb/2017c/posix/America/Los_Angeles
A testdata/tzdb/2017c/posix/America/Louisville
A testdata/tzdb/2017c/posix/America/Lower_Princes
A testdata/tzdb/2017c/posix/America/Maceio
A testdata/tzdb/2017c/posix/America/Managua
A testdata/tzdb/2017c/posix/America/Manaus
A testdata/tzdb/2017c/posix/America/Marigot
A testdata/tzdb/2017c/posix/America/Martinique
A testdata/tzdb/2017c/posix/America/Matamoros
A testdata/tzdb/2017c/posix/America/Mazatlan
A testdata/tzdb/2017c/posix/America/Mendoza
A testdata/tzdb/2017c/posix/America/Menominee
A testdata/tzdb/2017c/posix/America/Merida
A testdata/tzdb/2017c/posix/America/Metlakatla
A testdata/tzdb/2017c/posix/America/Mexico_City
A testdata/tzdb/2017c/posix/America/Miquelon
A testdata/tzdb/2017c/posix/America/Moncton
A testdata/tzdb/2017c/posix/America/Monterrey
A testdata/tzdb/2017c/posix/America/Montevideo
A testdata/tzdb/2017c/posix/America/Montreal
A testdata/tzdb/2017c/posix/America/Montserrat
A testdata/tzdb/2017c/posix/America/Nassau
A testdata/tzdb/2017c/posix/America/New_York
A testdata/tzdb/2017c/posix/America/Nipigon
A testdata/tzdb/2017c/posix/America/Nome
A testdata/tzdb/2017c/posix/America/Noronha
A testdata/tzdb/2017c/posix/America/North_Dakota/Beulah
A testdata/tzdb/2017c/posix/America/North_Dakota/Center
A testdata/tzdb/2017c/posix/America/North_Dakota/New_Salem
A testdata/tzdb/2017c/posix/America/Ojinaga
A testdata/tzdb/2017c/posix/America/Panama
A testdata/tzdb/2017c/posix/America/Pangnirtung
A testdata/tzdb/2017c/posix/America/Paramaribo
A testdata/tzdb/2017c/posix/America/Phoenix
A testdata/tzdb/2017c/posix/America/Port-au-Prince
A testdata/tzdb/2017c/posix/America/Port_of_Spain
A testdata/tzdb/2017c/posix/America/Porto_Acre
A testdata/tzdb/2017c/posix/America/Porto_Velho
A testdata/tzdb/2017c/posix/America/Puerto_Rico
A testdata/tzdb/2017c/posix/America/Punta_Arenas
A testdata/tzdb/2017c/posix/America/Rainy_River
A testdata/tzdb/2017c/posix/America/Rankin_Inlet
A testdata/tzdb/2017c/posix/America/Recife
A testdata/tzdb/2017c/posix/America/Regina
A testdata/tzdb/2017c/posix/America/Resolute
A testdata/tzdb/2017c/posix/America/Rio_Branco
A testdata/tzdb/2017c/posix/America/Rosario
A testdata/tzdb/2017c/posix/America/Santa_Isabel
A testdata/tzdb/2017c/posix/America/Santarem
A testdata/tzdb/2017c/posix/America/Santiago
A testdata/tzdb/2017c/posix/America/Santo_Domingo
A testdata/tzdb/2017c/posix/America/Sao_Paulo
A testdata/tzdb/2017c/posix/America/Scoresbysund
A testdata/tzdb/2017c/posix/America/Shiprock
A testdata/tzdb/2017c/posix/America/Sitka
A testdata/tzdb/2017c/posix/America/St_Barthelemy
A testdata/tzdb/2017c/posix/America/St_Johns
A testdata/tzdb/2017c/posix/America/St_Kitts
A testdata/tzdb/2017c/posix/America/St_Lucia
A testdata/tzdb/2017c/posix/America/St_Thomas
A testdata/tzdb/2017c/posix/America/St_Vincent
A testdata/tzdb/2017c/posix/America/Swift_Current
A testdata/tzdb/2017c/posix/America/Tegucigalpa
A testdata/tzdb/2017c/posix/America/Thule
A testdata/tzdb/2017c/posix/America/Thunder_Bay
A testdata/tzdb/2017c/posix/America/Tijuana
A testdata/tzdb/2017c/posix/America/Toronto
A testdata/tzdb/2017c/posix/America/Tortola
A testdata/tzdb/2017c/posix/America/Vancouver
A testdata/tzdb/2017c/posix/America/Virgin
A testdata/tzdb/2017c/posix/America/Whitehorse
A testdata/tzdb/2017c/posix/America/Winnipeg
A testdata/tzdb/2017c/posix/America/Yakutat
A testdata/tzdb/2017c/posix/America/Yellowknife
A testdata/tzdb/2017c/posix/Antarctica/Casey
A testdata/tzdb/2017c/posix/Antarctica/Davis
A testdata/tzdb/2017c/posix/Antarctica/DumontDUrville
A testdata/tzdb/2017c/posix/Antarctica/Macquarie
A testdata/tzdb/2017c/posix/Antarctica/Mawson
A testdata/tzdb/2017c/posix/Antarctica/McMurdo
A testdata/tzdb/2017c/posix/Antarctica/Palmer
A testdata/tzdb/2017c/posix/Antarctica/Rothera
A testdata/tzdb/2017c/posix/Antarctica/South_Pole
A testdata/tzdb/2017c/posix/Antarctica/Syowa
A testdata/tzdb/2017c/posix/Antarctica/Troll
A testdata/tzdb/2017c/posix/Antarctica/Vostok
A testdata/tzdb/2017c/posix/Arctic/Longyearbyen
A testdata/tzdb/2017c/posix/Asia/Aden
A testdata/tzdb/2017c/posix/Asia/Almaty
A testdata/tzdb/2017c/posix/Asia/Amman
A testdata/tzdb/2017c/posix/Asia/Anadyr
A testdata/tzdb/2017c/posix/Asia/Aqtau
A testdata/tzdb/2017c/posix/Asia/Aqtobe
A testdata/tzdb/2017c/posix/Asia/Ashgabat
A testdata/tzdb/2017c/posix/Asia/Ashkhabad
A testdata/tzdb/2017c/posix/Asia/Atyrau
A testdata/tzdb/2017c/posix/Asia/Baghdad
A testdata/tzdb/2017c/posix/Asia/Bahrain
A testdata/tzdb/2017c/posix/Asia/Baku
A testdata/tzdb/2017c/posix/Asia/Bangkok
A testdata/tzdb/2017c/posix/Asia/Barnaul
A testdata/tzdb/2017c/posix/Asia/Beirut
A testdata/tzdb/2017c/posix/Asia/Bishkek
A testdata/tzdb/2017c/posix/Asia/Brunei
A testdata/tzdb/2017c/posix/Asia/Calcutta
A testdata/tzdb/2017c/posix/Asia/Chita
A testdata/tzdb/2017c/posix/Asia/Choibalsan
A testdata/tzdb/2017c/posix/Asia/Chongqing
A testdata/tzdb/2017c/posix/Asia/Chungking
A testdata/tzdb/2017c/posix/Asia/Colombo
A testdata/tzdb/2017c/posix/Asia/Dacca
A testdata/tzdb/2017c/posix/Asia/Damascus
A testdata/tzdb/2017c/posix/Asia/Dhaka
A testdata/tzdb/2017c/posix/Asia/Dili
A testdata/tzdb/2017c/posix/Asia/Dubai
A testdata/tzdb/2017c/posix/Asia/Dushanbe
A testdata/tzdb/2017c/posix/Asia/Famagusta
A testdata/tzdb/2017c/posix/Asia/Gaza
A testdata/tzdb/2017c/posix/Asia/Harbin
A testdata/tzdb/2017c/posix/Asia/Hebron
A testdata/tzdb/2017c/posix/Asia/Ho_Chi_Minh
A testdata/tzdb/2017c/posix/Asia/Hong_Kong
A testdata/tzdb/2017c/posix/Asia/Hovd
A testdata/tzdb/2017c/posix/Asia/Irkutsk
A testdata/tzdb/2017c/posix/Asia/Istanbul
A testdata/tzdb/2017c/posix/Asia/Jakarta
A testdata/tzdb/2017c/posix/Asia/Jayapura
A testdata/tzdb/2017c/posix/Asia/Jerusalem
A testdata/tzdb/2017c/posix/Asia/Kabul
A testdata/tzdb/2017c/posix/Asia/Kamchatka
A testdata/tzdb/2017c/posix/Asia/Karachi
A testdata/tzdb/2017c/posix/Asia/Kashgar
A testdata/tzdb/2017c/posix/Asia/Kathmandu
A testdata/tzdb/2017c/posix/Asia/Katmandu
A testdata/tzdb/2017c/posix/Asia/Khandyga
A testdata/tzdb/2017c/posix/Asia/Kolkata
A testdata/tzdb/2017c/posix/Asia/Krasnoyarsk
A testdata/tzdb/2017c/posix/Asia/Kuala_Lumpur
A testdata/tzdb/2017c/posix/Asia/Kuching
A testdata/tzdb/2017c/posix/Asia/Kuwait
A testdata/tzdb/2017c/posix/Asia/Macao
A testdata/tzdb/2017c/posix/Asia/Macau
A testdata/tzdb/2017c/posix/Asia/Magadan
A testdata/tzdb/2017c/posix/Asia/Makassar
A testdata/tzdb/2017c/posix/Asia/Manila
A testdata/tzdb/2017c/posix/Asia/Muscat
A testdata/tzdb/2017c/posix/Asia/Nicosia
A testdata/tzdb/2017c/posix/Asia/Novokuznetsk
A testdata/tzdb/2017c/posix/Asia/Novosibirsk
A testdata/tzdb/2017c/posix/Asia/Omsk
A testdata/tzdb/2017c/posix/Asia/Oral
A testdata/tzdb/2017c/posix/Asia/Phnom_Penh
A testdata/tzdb/2017c/posix/Asia/Pontianak
A testdata/tzdb/2017c/posix/Asia/Pyongyang
A testdata/tzdb/2017c/posix/Asia/Qatar
A testdata/tzdb/2017c/posix/Asia/Qyzylorda
A testdata/tzdb/2017c/posix/Asia/Rangoon
A testdata/tzdb/2017c/posix/Asia/Riyadh
A testdata/tzdb/2017c/posix/Asia/Saigon
A testdata/tzdb/2017c/posix/Asia/Sakhalin
A testdata/tzdb/2017c/posix/Asia/Samarkand
A testdata/tzdb/2017c/posix/Asia/Seoul
A testdata/tzdb/2017c/posix/Asia/Shanghai
A testdata/tzdb/2017c/posix/Asia/Singapore
A testdata/tzdb/2017c/posix/Asia/Srednekolymsk
A testdata/tzdb/2017c/posix/Asia/Taipei
A testdata/tzdb/2017c/posix/Asia/Tashkent
A testdata/tzdb/2017c/posix/Asia/Tbilisi
A testdata/tzdb/2017c/posix/Asia/Tehran
A testdata/tzdb/2017c/posix/Asia/Tel_Aviv
A testdata/tzdb/2017c/posix/Asia/Thimbu
A testdata/tzdb/2017c/posix/Asia/Thimphu
A testdata/tzdb/2017c/posix/Asia/Tokyo
A testdata/tzdb/2017c/posix/Asia/Tomsk
A testdata/tzdb/2017c/posix/Asia/Ujung_Pandang
A testdata/tzdb/2017c/posix/Asia/Ulaanbaatar
A testdata/tzdb/2017c/posix/Asia/Ulan_Bator
A testdata/tzdb/2017c/posix/Asia/Urumqi
A testdata/tzdb/2017c/posix/Asia/Ust-Nera
A testdata/tzdb/2017c/posix/Asia/Vientiane
A testdata/tzdb/2017c/posix/Asia/Vladivostok
A testdata/tzdb/2017c/posix/Asia/Yakutsk
A testdata/tzdb/2017c/posix/Asia/Yangon
A testdata/tzdb/2017c/posix/Asia/Yekaterinburg
A testdata/tzdb/2017c/posix/Asia/Yerevan
A testdata/tzdb/2017c/posix/Atlantic/Azores
A testdata/tzdb/2017c/posix/Atlantic/Bermuda
A testdata/tzdb/2017c/posix/Atlantic/Canary
A testdata/tzdb/2017c/posix/Atlantic/Cape_Verde
A testdata/tzdb/2017c/posix/Atlantic/Faeroe
A testdata/tzdb/2017c/posix/Atlantic/Faroe
A testdata/tzdb/2017c/posix/Atlantic/Jan_Mayen
A testdata/tzdb/2017c/posix/Atlantic/Madeira
A testdata/tzdb/2017c/posix/Atlantic/Reykjavik
A testdata/tzdb/2017c/posix/Atlantic/South_Georgia
A testdata/tzdb/2017c/posix/Atlantic/St_Helena
A testdata/tzdb/2017c/posix/Atlantic/Stanley
A testdata/tzdb/2017c/posix/Australia/ACT
A testdata/tzdb/2017c/posix/Australia/Adelaide
A testdata/tzdb/2017c/posix/Australia/Brisbane
A testdata/tzdb/2017c/posix/Australia/Broken_Hill
A testdata/tzdb/2017c/posix/Australia/Canberra
A testdata/tzdb/2017c/posix/Australia/Currie
A testdata/tzdb/2017c/posix/Australia/Darwin
A testdata/tzdb/2017c/posix/Australia/Eucla
A testdata/tzdb/2017c/posix/Australia/Hobart
A testdata/tzdb/2017c/posix/Australia/LHI
A testdata/tzdb/2017c/posix/Australia/Lindeman
A testdata/tzdb/2017c/posix/Australia/Lord_Howe
A testdata/tzdb/2017c/posix/Australia/Melbourne
A testdata/tzdb/2017c/posix/Australia/NSW
A testdata/tzdb/2017c/posix/Australia/North
A testdata/tzdb/2017c/posix/Australia/Perth
A testdata/tzdb/2017c/posix/Australia/Queensland
A testdata/tzdb/2017c/posix/Australia/South
A testdata/tzdb/2017c/posix/Australia/Sydney
A testdata/tzdb/2017c/posix/Australia/Tasmania
A testdata/tzdb/2017c/posix/Australia/Victoria
A testdata/tzdb/2017c/posix/Australia/West
A testdata/tzdb/2017c/posix/Australia/Yancowinna
A testdata/tzdb/2017c/posix/Brazil/Acre
A testdata/tzdb/2017c/posix/Brazil/DeNoronha
A testdata/tzdb/2017c/posix/Brazil/East
A testdata/tzdb/2017c/posix/Brazil/West
A testdata/tzdb/2017c/posix/CET
A testdata/tzdb/2017c/posix/CST6CDT
A testdata/tzdb/2017c/posix/Canada/Atlantic
A testdata/tzdb/2017c/posix/Canada/Central
A testdata/tzdb/2017c/posix/Canada/Eastern
A testdata/tzdb/2017c/posix/Canada/Mountain
A testdata/tzdb/2017c/posix/Canada/Newfoundland
A testdata/tzdb/2017c/posix/Canada/Pacific
A testdata/tzdb/2017c/posix/Canada/Saskatchewan
A testdata/tzdb/2017c/posix/Canada/Yukon
A testdata/tzdb/2017c/posix/Chile/Continental
A testdata/tzdb/2017c/posix/Chile/EasterIsland
A testdata/tzdb/2017c/posix/Cuba
A testdata/tzdb/2017c/posix/EET
A testdata/tzdb/2017c/posix/EST
A testdata/tzdb/2017c/posix/EST5EDT
A testdata/tzdb/2017c/posix/Egypt
A testdata/tzdb/2017c/posix/Eire
A testdata/tzdb/2017c/posix/Etc/GMT
A testdata/tzdb/2017c/posix/Etc/GMT+0
A testdata/tzdb/2017c/posix/Etc/GMT+1
A testdata/tzdb/2017c/posix/Etc/GMT+10
A testdata/tzdb/2017c/posix/Etc/GMT+11
A testdata/tzdb/2017c/posix/Etc/GMT+12
A testdata/tzdb/2017c/posix/Etc/GMT+2
A testdata/tzdb/2017c/posix/Etc/GMT+3
A testdata/tzdb/2017c/posix/Etc/GMT+4
A testdata/tzdb/2017c/posix/Etc/GMT+5
A testdata/tzdb/2017c/posix/Etc/GMT+6
A testdata/tzdb/2017c/posix/Etc/GMT+7
A testdata/tzdb/2017c/posix/Etc/GMT+8
A testdata/tzdb/2017c/posix/Etc/GMT+9
A testdata/tzdb/2017c/posix/Etc/GMT-0
A testdata/tzdb/2017c/posix/Etc/GMT-1
A testdata/tzdb/2017c/posix/Etc/GMT-10
A testdata/tzdb/2017c/posix/Etc/GMT-11
A testdata/tzdb/2017c/posix/Etc/GMT-12
A testdata/tzdb/2017c/posix/Etc/GMT-13
A testdata/tzdb/2017c/posix/Etc/GMT-14
A testdata/tzdb/2017c/posix/Etc/GMT-2
A testdata/tzdb/2017c/posix/Etc/GMT-3
A testdata/tzdb/2017c/posix/Etc/GMT-4
A testdata/tzdb/2017c/posix/Etc/GMT-5
A testdata/tzdb/2017c/posix/Etc/GMT-6
A testdata/tzdb/2017c/posix/Etc/GMT-7
A testdata/tzdb/2017c/posix/Etc/GMT-8
A testdata/tzdb/2017c/posix/Etc/GMT-9
A testdata/tzdb/2017c/posix/Etc/GMT0
A testdata/tzdb/2017c/posix/Etc/Greenwich
A testdata/tzdb/2017c/posix/Etc/UCT
A testdata/tzdb/2017c/posix/Etc/UTC
A testdata/tzdb/2017c/posix/Etc/Universal
A testdata/tzdb/2017c/posix/Etc/Zulu
A testdata/tzdb/2017c/posix/Europe/Amsterdam
A testdata/tzdb/2017c/posix/Europe/Andorra
A testdata/tzdb/2017c/posix/Europe/Astrakhan
A testdata/tzdb/2017c/posix/Europe/Athens
A testdata/tzdb/2017c/posix/Europe/Belfast
A testdata/tzdb/2017c/posix/Europe/Belgrade
A testdata/tzdb/2017c/posix/Europe/Berlin
A testdata/tzdb/2017c/posix/Europe/Bratislava
A testdata/tzdb/2017c/posix/Europe/Brussels
A testdata/tzdb/2017c/posix/Europe/Bucharest
A testdata/tzdb/2017c/posix/Europe/Budapest
A testdata/tzdb/2017c/posix/Europe/Busingen
A testdata/tzdb/2017c/posix/Europe/Chisinau
A testdata/tzdb/2017c/posix/Europe/Copenhagen
A testdata/tzdb/2017c/posix/Europe/Dublin
A testdata/tzdb/2017c/posix/Europe/Gibraltar
A testdata/tzdb/2017c/posix/Europe/Guernsey
A testdata/tzdb/2017c/posix/Europe/Helsinki
A testdata/tzdb/2017c/posix/Europe/Isle_of_Man
A testdata/tzdb/2017c/posix/Europe/Istanbul
A testdata/tzdb/2017c/posix/Europe/Jersey
A testdata/tzdb/2017c/posix/Europe/Kaliningrad
A testdata/tzdb/2017c/posix/Europe/Kiev
A testdata/tzdb/2017c/posix/Europe/Kirov
A testdata/tzdb/2017c/posix/Europe/Lisbon
A testdata/tzdb/2017c/posix/Europe/Ljubljana
A testdata/tzdb/2017c/posix/Europe/London
A testdata/tzdb/2017c/posix/Europe/Luxembourg
A testdata/tzdb/2017c/posix/Europe/Madrid
A testdata/tzdb/2017c/posix/Europe/Malta
A testdata/tzdb/2017c/posix/Europe/Mariehamn
A testdata/tzdb/2017c/posix/Europe/Minsk
A testdata/tzdb/2017c/posix/Europe/Monaco
A testdata/tzdb/2017c/posix/Europe/Moscow
A testdata/tzdb/2017c/posix/Europe/Nicosia
A testdata/tzdb/2017c/posix/Europe/Oslo
A testdata/tzdb/2017c/posix/Europe/Paris
A testdata/tzdb/2017c/posix/Europe/Podgorica
A testdata/tzdb/2017c/posix/Europe/Prague
A testdata/tzdb/2017c/posix/Europe/Riga
A testdata/tzdb/2017c/posix/Europe/Rome
A testdata/tzdb/2017c/posix/Europe/Samara
A testdata/tzdb/2017c/posix/Europe/San_Marino
A testdata/tzdb/2017c/posix/Europe/Sarajevo
A testdata/tzdb/2017c/posix/Europe/Saratov
A testdata/tzdb/2017c/posix/Europe/Simferopol
A testdata/tzdb/2017c/posix/Europe/Skopje
A testdata/tzdb/2017c/posix/Europe/Sofia
A testdata/tzdb/2017c/posix/Europe/Stockholm
A testdata/tzdb/2017c/posix/Europe/Tallinn
A testdata/tzdb/2017c/posix/Europe/Tirane
A testdata/tzdb/2017c/posix/Europe/Tiraspol
A testdata/tzdb/2017c/posix/Europe/Ulyanovsk
A testdata/tzdb/2017c/posix/Europe/Uzhgorod
A testdata/tzdb/2017c/posix/Europe/Vaduz
A testdata/tzdb/2017c/posix/Europe/Vatican
A testdata/tzdb/2017c/posix/Europe/Vienna
A testdata/tzdb/2017c/posix/Europe/Vilnius
A testdata/tzdb/2017c/posix/Europe/Volgograd
A testdata/tzdb/2017c/posix/Europe/Warsaw
A testdata/tzdb/2017c/posix/Europe/Zagreb
A testdata/tzdb/2017c/posix/Europe/Zaporozhye
A testdata/tzdb/2017c/posix/Europe/Zurich
A testdata/tzdb/2017c/posix/Factory
A testdata/tzdb/2017c/posix/GB
A testdata/tzdb/2017c/posix/GB-Eire
A testdata/tzdb/2017c/posix/GMT
A testdata/tzdb/2017c/posix/GMT+0
A testdata/tzdb/2017c/posix/GMT-0
A testdata/tzdb/2017c/posix/GMT0
A testdata/tzdb/2017c/posix/Greenwich
A testdata/tzdb/2017c/posix/HST
A testdata/tzdb/2017c/posix/Hongkong
A testdata/tzdb/2017c/posix/Iceland
A testdata/tzdb/2017c/posix/Indian/Antananarivo
A testdata/tzdb/2017c/posix/Indian/Chagos
A testdata/tzdb/2017c/posix/Indian/Christmas
A testdata/tzdb/2017c/posix/Indian/Cocos
A testdata/tzdb/2017c/posix/Indian/Comoro
A testdata/tzdb/2017c/posix/Indian/Kerguelen
A testdata/tzdb/2017c/posix/Indian/Mahe
A testdata/tzdb/2017c/posix/Indian/Maldives
A testdata/tzdb/2017c/posix/Indian/Mauritius
A testdata/tzdb/2017c/posix/Indian/Mayotte
A testdata/tzdb/2017c/posix/Indian/Reunion
A testdata/tzdb/2017c/posix/Iran
A testdata/tzdb/2017c/posix/Israel
A testdata/tzdb/2017c/posix/Jamaica
A testdata/tzdb/2017c/posix/Japan
A testdata/tzdb/2017c/posix/Kwajalein
A testdata/tzdb/2017c/posix/Libya
A testdata/tzdb/2017c/posix/MET
A testdata/tzdb/2017c/posix/MST
A testdata/tzdb/2017c/posix/MST7MDT
A testdata/tzdb/2017c/posix/Mexico/BajaNorte
A testdata/tzdb/2017c/posix/Mexico/BajaSur
A testdata/tzdb/2017c/posix/Mexico/General
A testdata/tzdb/2017c/posix/NZ
A testdata/tzdb/2017c/posix/NZ-CHAT
A testdata/tzdb/2017c/posix/Navajo
A testdata/tzdb/2017c/posix/PRC
A testdata/tzdb/2017c/posix/PST8PDT
A testdata/tzdb/2017c/posix/Pacific/Apia
A testdata/tzdb/2017c/posix/Pacific/Auckland
A testdata/tzdb/2017c/posix/Pacific/Bougainville
A testdata/tzdb/2017c/posix/Pacific/Chatham
A testdata/tzdb/2017c/posix/Pacific/Chuuk
A testdata/tzdb/2017c/posix/Pacific/Easter
A testdata/tzdb/2017c/posix/Pacific/Efate
A testdata/tzdb/2017c/posix/Pacific/Enderbury
A testdata/tzdb/2017c/posix/Pacific/Fakaofo
A testdata/tzdb/2017c/posix/Pacific/Fiji
A testdata/tzdb/2017c/posix/Pacific/Funafuti
A testdata/tzdb/2017c/posix/Pacific/Galapagos
A testdata/tzdb/2017c/posix/Pacific/Gambier
A testdata/tzdb/2017c/posix/Pacific/Guadalcanal
A testdata/tzdb/2017c/posix/Pacific/Guam
A testdata/tzdb/2017c/posix/Pacific/Honolulu
A testdata/tzdb/2017c/posix/Pacific/Johnston
A testdata/tzdb/2017c/posix/Pacific/Kiritimati
A testdata/tzdb/2017c/posix/Pacific/Kosrae
A testdata/tzdb/2017c/posix/Pacific/Kwajalein
A testdata/tzdb/2017c/posix/Pacific/Majuro
A testdata/tzdb/2017c/posix/Pacific/Marquesas
A testdata/tzdb/2017c/posix/Pacific/Midway
A testdata/tzdb/2017c/posix/Pacific/Nauru
A testdata/tzdb/2017c/posix/Pacific/Niue
A testdata/tzdb/2017c/posix/Pacific/Norfolk
A testdata/tzdb/2017c/posix/Pacific/Noumea
A testdata/tzdb/2017c/posix/Pacific/Pago_Pago
A testdata/tzdb/2017c/posix/Pacific/Palau
A testdata/tzdb/2017c/posix/Pacific/Pitcairn
A testdata/tzdb/2017c/posix/Pacific/Pohnpei
A testdata/tzdb/2017c/posix/Pacific/Ponape
A testdata/tzdb/2017c/posix/Pacific/Port_Moresby
A testdata/tzdb/2017c/posix/Pacific/Rarotonga
A testdata/tzdb/2017c/posix/Pacific/Saipan
A testdata/tzdb/2017c/posix/Pacific/Samoa
A testdata/tzdb/2017c/posix/Pacific/Tahiti
A testdata/tzdb/2017c/posix/Pacific/Tarawa
A testdata/tzdb/2017c/posix/Pacific/Tongatapu
A testdata/tzdb/2017c/posix/Pacific/Truk
A testdata/tzdb/2017c/posix/Pacific/Wake
A testdata/tzdb/2017c/posix/Pacific/Wallis
A testdata/tzdb/2017c/posix/Pacific/Yap
A testdata/tzdb/2017c/posix/Poland
A testdata/tzdb/2017c/posix/Portugal
A testdata/tzdb/2017c/posix/ROC
A testdata/tzdb/2017c/posix/ROK
A testdata/tzdb/2017c/posix/Singapore
A testdata/tzdb/2017c/posix/SystemV/AST4
A testdata/tzdb/2017c/posix/SystemV/AST4ADT
A testdata/tzdb/2017c/posix/SystemV/CST6
A testdata/tzdb/2017c/posix/SystemV/CST6CDT
A testdata/tzdb/2017c/posix/SystemV/EST5
A testdata/tzdb/2017c/posix/SystemV/EST5EDT
A testdata/tzdb/2017c/posix/SystemV/HST10
A testdata/tzdb/2017c/posix/SystemV/MST7
A testdata/tzdb/2017c/posix/SystemV/MST7MDT
A testdata/tzdb/2017c/posix/SystemV/PST8
A testdata/tzdb/2017c/posix/SystemV/PST8PDT
A testdata/tzdb/2017c/posix/SystemV/YST9
A testdata/tzdb/2017c/posix/SystemV/YST9YDT
A testdata/tzdb/2017c/posix/Turkey
A testdata/tzdb/2017c/posix/UCT
A testdata/tzdb/2017c/posix/US/Alaska
A testdata/tzdb/2017c/posix/US/Aleutian
A testdata/tzdb/2017c/posix/US/Arizona
A testdata/tzdb/2017c/posix/US/Central
A testdata/tzdb/2017c/posix/US/East-Indiana
A testdata/tzdb/2017c/posix/US/Eastern
A testdata/tzdb/2017c/posix/US/Hawaii
A testdata/tzdb/2017c/posix/US/Indiana-Starke
A testdata/tzdb/2017c/posix/US/Michigan
A testdata/tzdb/2017c/posix/US/Mountain
A testdata/tzdb/2017c/posix/US/Pacific
A testdata/tzdb/2017c/posix/US/Pacific-New
A testdata/tzdb/2017c/posix/US/Samoa
A testdata/tzdb/2017c/posix/UTC
A testdata/tzdb/2017c/posix/Universal
A testdata/tzdb/2017c/posix/W-SU
A testdata/tzdb/2017c/posix/WET
A testdata/tzdb/2017c/posix/Zulu
A testdata/tzdb/2017c/posixrules
A testdata/tzdb/2017c/right/Africa/Abidjan
A testdata/tzdb/2017c/right/Africa/Accra
A testdata/tzdb/2017c/right/Africa/Addis_Ababa
A testdata/tzdb/2017c/right/Africa/Algiers
A testdata/tzdb/2017c/right/Africa/Asmara
A testdata/tzdb/2017c/right/Africa/Asmera
A testdata/tzdb/2017c/right/Africa/Bamako
A testdata/tzdb/2017c/right/Africa/Bangui
A testdata/tzdb/2017c/right/Africa/Banjul
A testdata/tzdb/2017c/right/Africa/Bissau
A testdata/tzdb/2017c/right/Africa/Blantyre
A testdata/tzdb/2017c/right/Africa/Brazzaville
A testdata/tzdb/2017c/right/Africa/Bujumbura
A testdata/tzdb/2017c/right/Africa/Cairo
A testdata/tzdb/2017c/right/Africa/Casablanca
A testdata/tzdb/2017c/right/Africa/Ceuta
A testdata/tzdb/2017c/right/Africa/Conakry
A testdata/tzdb/2017c/right/Africa/Dakar
A testdata/tzdb/2017c/right/Africa/Dar_es_Salaam
A testdata/tzdb/2017c/right/Africa/Djibouti
A testdata/tzdb/2017c/right/Africa/Douala
A testdata/tzdb/2017c/right/Africa/El_Aaiun
A testdata/tzdb/2017c/right/Africa/Freetown
A testdata/tzdb/2017c/right/Africa/Gaborone
A testdata/tzdb/2017c/right/Africa/Harare
A testdata/tzdb/2017c/right/Africa/Johannesburg
A testdata/tzdb/2017c/right/Africa/Juba
A testdata/tzdb/2017c/right/Africa/Kampala
A testdata/tzdb/2017c/right/Africa/Khartoum
A testdata/tzdb/2017c/right/Africa/Kigali
A testdata/tzdb/2017c/right/Africa/Kinshasa
A testdata/tzdb/2017c/right/Africa/Lagos
A testdata/tzdb/2017c/right/Africa/Libreville
A testdata/tzdb/2017c/right/Africa/Lome
A testdata/tzdb/2017c/right/Africa/Luanda
A testdata/tzdb/2017c/right/Africa/Lubumbashi
A testdata/tzdb/2017c/right/Africa/Lusaka
A testdata/tzdb/2017c/right/Africa/Malabo
A testdata/tzdb/2017c/right/Africa/Maputo
A testdata/tzdb/2017c/right/Africa/Maseru
A testdata/tzdb/2017c/right/Africa/Mbabane
A testdata/tzdb/2017c/right/Africa/Mogadishu
A testdata/tzdb/2017c/right/Africa/Monrovia
A testdata/tzdb/2017c/right/Africa/Nairobi
A testdata/tzdb/2017c/right/Africa/Ndjamena
A testdata/tzdb/2017c/right/Africa/Niamey
A testdata/tzdb/2017c/right/Africa/Nouakchott
A testdata/tzdb/2017c/right/Africa/Ouagadougou
A testdata/tzdb/2017c/right/Africa/Porto-Novo
A testdata/tzdb/2017c/right/Africa/Sao_Tome
A testdata/tzdb/2017c/right/Africa/Timbuktu
A testdata/tzdb/2017c/right/Africa/Tripoli
A testdata/tzdb/2017c/right/Africa/Tunis
A testdata/tzdb/2017c/right/Africa/Windhoek
A testdata/tzdb/2017c/right/America/Adak
A testdata/tzdb/2017c/right/America/Anchorage
A testdata/tzdb/2017c/right/America/Anguilla
A testdata/tzdb/2017c/right/America/Antigua
A testdata/tzdb/2017c/right/America/Araguaina
A testdata/tzdb/2017c/right/America/Argentina/Buenos_Aires
A testdata/tzdb/2017c/right/America/Argentina/Catamarca
A testdata/tzdb/2017c/right/America/Argentina/ComodRivadavia
A testdata/tzdb/2017c/right/America/Argentina/Cordoba
A testdata/tzdb/2017c/right/America/Argentina/Jujuy
A testdata/tzdb/2017c/right/America/Argentina/La_Rioja
A testdata/tzdb/2017c/right/America/Argentina/Mendoza
A testdata/tzdb/2017c/right/America/Argentina/Rio_Gallegos
A testdata/tzdb/2017c/right/America/Argentina/Salta
A testdata/tzdb/2017c/right/America/Argentina/San_Juan
A testdata/tzdb/2017c/right/America/Argentina/San_Luis
A testdata/tzdb/2017c/right/America/Argentina/Tucuman
A testdata/tzdb/2017c/right/America/Argentina/Ushuaia
A testdata/tzdb/2017c/right/America/Aruba
A testdata/tzdb/2017c/right/America/Asuncion
A testdata/tzdb/2017c/right/America/Atikokan
A testdata/tzdb/2017c/right/America/Atka
A testdata/tzdb/2017c/right/America/Bahia
A testdata/tzdb/2017c/right/America/Bahia_Banderas
A testdata/tzdb/2017c/right/America/Barbados
A testdata/tzdb/2017c/right/America/Belem
A testdata/tzdb/2017c/right/America/Belize
A testdata/tzdb/2017c/right/America/Blanc-Sablon
A testdata/tzdb/2017c/right/America/Boa_Vista
A testdata/tzdb/2017c/right/America/Bogota
A testdata/tzdb/2017c/right/America/Boise
A testdata/tzdb/2017c/right/America/Buenos_Aires
A testdata/tzdb/2017c/right/America/Cambridge_Bay
A testdata/tzdb/2017c/right/America/Campo_Grande
A testdata/tzdb/2017c/right/America/Cancun
A testdata/tzdb/2017c/right/America/Caracas
A testdata/tzdb/2017c/right/America/Catamarca
A testdata/tzdb/2017c/right/America/Cayenne
A testdata/tzdb/2017c/right/America/Cayman
A testdata/tzdb/2017c/right/America/Chicago
A testdata/tzdb/2017c/right/America/Chihuahua
A testdata/tzdb/2017c/right/America/Coral_Harbour
A testdata/tzdb/2017c/right/America/Cordoba
A testdata/tzdb/2017c/right/America/Costa_Rica
A testdata/tzdb/2017c/right/America/Creston
A testdata/tzdb/2017c/right/America/Cuiaba
A testdata/tzdb/2017c/right/America/Curacao
A testdata/tzdb/2017c/right/America/Danmarkshavn
A testdata/tzdb/2017c/right/America/Dawson
A testdata/tzdb/2017c/right/America/Dawson_Creek
A testdata/tzdb/2017c/right/America/Denver
A testdata/tzdb/2017c/right/America/Detroit
A testdata/tzdb/2017c/right/America/Dominica
A testdata/tzdb/2017c/right/America/Edmonton
A testdata/tzdb/2017c/right/America/Eirunepe
A testdata/tzdb/2017c/right/America/El_Salvador
A testdata/tzdb/2017c/right/America/Ensenada
A testdata/tzdb/2017c/right/America/Fort_Nelson
A testdata/tzdb/2017c/right/America/Fort_Wayne
A testdata/tzdb/2017c/right/America/Fortaleza
A testdata/tzdb/2017c/right/America/Glace_Bay
A testdata/tzdb/2017c/right/America/Godthab
A testdata/tzdb/2017c/right/America/Goose_Bay
A testdata/tzdb/2017c/right/America/Grand_Turk
A testdata/tzdb/2017c/right/America/Grenada
A testdata/tzdb/2017c/right/America/Guadeloupe
A testdata/tzdb/2017c/right/America/Guatemala
A testdata/tzdb/2017c/right/America/Guayaquil
A testdata/tzdb/2017c/right/America/Guyana
A testdata/tzdb/2017c/right/America/Halifax
A testdata/tzdb/2017c/right/America/Havana
A testdata/tzdb/2017c/right/America/Hermosillo
A testdata/tzdb/2017c/right/America/Indiana/Indianapolis
A testdata/tzdb/2017c/right/America/Indiana/Knox
A testdata/tzdb/2017c/right/America/Indiana/Marengo
A testdata/tzdb/2017c/right/America/Indiana/Petersburg
A testdata/tzdb/2017c/right/America/Indiana/Tell_City
A testdata/tzdb/2017c/right/America/Indiana/Vevay
A testdata/tzdb/2017c/right/America/Indiana/Vincennes
A testdata/tzdb/2017c/right/America/Indiana/Winamac
A testdata/tzdb/2017c/right/America/Indianapolis
A testdata/tzdb/2017c/right/America/Inuvik
A testdata/tzdb/2017c/right/America/Iqaluit
A testdata/tzdb/2017c/right/America/Jamaica
A testdata/tzdb/2017c/right/America/Jujuy
A testdata/tzdb/2017c/right/America/Juneau
A testdata/tzdb/2017c/right/America/Kentucky/Louisville
A testdata/tzdb/2017c/right/America/Kentucky/Monticello
A testdata/tzdb/2017c/right/America/Knox_IN
A testdata/tzdb/2017c/right/America/Kralendijk
A testdata/tzdb/2017c/right/America/La_Paz
A testdata/tzdb/2017c/right/America/Lima
A testdata/tzdb/2017c/right/America/Los_Angeles
A testdata/tzdb/2017c/right/America/Louisville
A testdata/tzdb/2017c/right/America/Lower_Princes
A testdata/tzdb/2017c/right/America/Maceio
A testdata/tzdb/2017c/right/America/Managua
A testdata/tzdb/2017c/right/America/Manaus
A testdata/tzdb/2017c/right/America/Marigot
A testdata/tzdb/2017c/right/America/Martinique
A testdata/tzdb/2017c/right/America/Matamoros
A testdata/tzdb/2017c/right/America/Mazatlan
A testdata/tzdb/2017c/right/America/Mendoza
A testdata/tzdb/2017c/right/America/Menominee
A testdata/tzdb/2017c/right/America/Merida
A testdata/tzdb/2017c/right/America/Metlakatla
A testdata/tzdb/2017c/right/America/Mexico_City
A testdata/tzdb/2017c/right/America/Miquelon
A testdata/tzdb/2017c/right/America/Moncton
A testdata/tzdb/2017c/right/America/Monterrey
A testdata/tzdb/2017c/right/America/Montevideo
A testdata/tzdb/2017c/right/America/Montreal
A testdata/tzdb/2017c/right/America/Montserrat
A testdata/tzdb/2017c/right/America/Nassau
A testdata/tzdb/2017c/right/America/New_York
A testdata/tzdb/2017c/right/America/Nipigon
A testdata/tzdb/2017c/right/America/Nome
A testdata/tzdb/2017c/right/America/Noronha
A testdata/tzdb/2017c/right/America/North_Dakota/Beulah
A testdata/tzdb/2017c/right/America/North_Dakota/Center
A testdata/tzdb/2017c/right/America/North_Dakota/New_Salem
A testdata/tzdb/2017c/right/America/Ojinaga
A testdata/tzdb/2017c/right/America/Panama
A testdata/tzdb/2017c/right/America/Pangnirtung
A testdata/tzdb/2017c/right/America/Paramaribo
A testdata/tzdb/2017c/right/America/Phoenix
A testdata/tzdb/2017c/right/America/Port-au-Prince
A testdata/tzdb/2017c/right/America/Port_of_Spain
A testdata/tzdb/2017c/right/America/Porto_Acre
A testdata/tzdb/2017c/right/America/Porto_Velho
A testdata/tzdb/2017c/right/America/Puerto_Rico
A testdata/tzdb/2017c/right/America/Punta_Arenas
A testdata/tzdb/2017c/right/America/Rainy_River
A testdata/tzdb/2017c/right/America/Rankin_Inlet
A testdata/tzdb/2017c/right/America/Recife
A testdata/tzdb/2017c/right/America/Regina
A testdata/tzdb/2017c/right/America/Resolute
A testdata/tzdb/2017c/right/America/Rio_Branco
A testdata/tzdb/2017c/right/America/Rosario
A testdata/tzdb/2017c/right/America/Santa_Isabel
A testdata/tzdb/2017c/right/America/Santarem
A testdata/tzdb/2017c/right/America/Santiago
A testdata/tzdb/2017c/right/America/Santo_Domingo
A testdata/tzdb/2017c/right/America/Sao_Paulo
A testdata/tzdb/2017c/right/America/Scoresbysund
A testdata/tzdb/2017c/right/America/Shiprock
A testdata/tzdb/2017c/right/America/Sitka
A testdata/tzdb/2017c/right/America/St_Barthelemy
A testdata/tzdb/2017c/right/America/St_Johns
A testdata/tzdb/2017c/right/America/St_Kitts
A testdata/tzdb/2017c/right/America/St_Lucia
A testdata/tzdb/2017c/right/America/St_Thomas
A testdata/tzdb/2017c/right/America/St_Vincent
A testdata/tzdb/2017c/right/America/Swift_Current
A testdata/tzdb/2017c/right/America/Tegucigalpa
A testdata/tzdb/2017c/right/America/Thule
A testdata/tzdb/2017c/right/America/Thunder_Bay
A testdata/tzdb/2017c/right/America/Tijuana
A testdata/tzdb/2017c/right/America/Toronto
A testdata/tzdb/2017c/right/America/Tortola
A testdata/tzdb/2017c/right/America/Vancouver
A testdata/tzdb/2017c/right/America/Virgin
A testdata/tzdb/2017c/right/America/Whitehorse
A testdata/tzdb/2017c/right/America/Winnipeg
A testdata/tzdb/2017c/right/America/Yakutat
A testdata/tzdb/2017c/right/America/Yellowknife
A testdata/tzdb/2017c/right/Antarctica/Casey
A testdata/tzdb/2017c/right/Antarctica/Davis
A testdata/tzdb/2017c/right/Antarctica/DumontDUrville
A testdata/tzdb/2017c/right/Antarctica/Macquarie
A testdata/tzdb/2017c/right/Antarctica/Mawson
A testdata/tzdb/2017c/right/Antarctica/McMurdo
A testdata/tzdb/2017c/right/Antarctica/Palmer
A testdata/tzdb/2017c/right/Antarctica/Rothera
A testdata/tzdb/2017c/right/Antarctica/South_Pole
A testdata/tzdb/2017c/right/Antarctica/Syowa
A testdata/tzdb/2017c/right/Antarctica/Troll
A testdata/tzdb/2017c/right/Antarctica/Vostok
A testdata/tzdb/2017c/right/Arctic/Longyearbyen
A testdata/tzdb/2017c/right/Asia/Aden
A testdata/tzdb/2017c/right/Asia/Almaty
A testdata/tzdb/2017c/right/Asia/Amman
A testdata/tzdb/2017c/right/Asia/Anadyr
A testdata/tzdb/2017c/right/Asia/Aqtau
A testdata/tzdb/2017c/right/Asia/Aqtobe
A testdata/tzdb/2017c/right/Asia/Ashgabat
A testdata/tzdb/2017c/right/Asia/Ashkhabad
A testdata/tzdb/2017c/right/Asia/Atyrau
A testdata/tzdb/2017c/right/Asia/Baghdad
A testdata/tzdb/2017c/right/Asia/Bahrain
A testdata/tzdb/2017c/right/Asia/Baku
A testdata/tzdb/2017c/right/Asia/Bangkok
A testdata/tzdb/2017c/right/Asia/Barnaul
A testdata/tzdb/2017c/right/Asia/Beirut
A testdata/tzdb/2017c/right/Asia/Bishkek
A testdata/tzdb/2017c/right/Asia/Brunei
A testdata/tzdb/2017c/right/Asia/Calcutta
A testdata/tzdb/2017c/right/Asia/Chita
A testdata/tzdb/2017c/right/Asia/Choibalsan
A testdata/tzdb/2017c/right/Asia/Chongqing
A testdata/tzdb/2017c/right/Asia/Chungking
A testdata/tzdb/2017c/right/Asia/Colombo
A testdata/tzdb/2017c/right/Asia/Dacca
A testdata/tzdb/2017c/right/Asia/Damascus
A testdata/tzdb/2017c/right/Asia/Dhaka
A testdata/tzdb/2017c/right/Asia/Dili
A testdata/tzdb/2017c/right/Asia/Dubai
A testdata/tzdb/2017c/right/Asia/Dushanbe
A testdata/tzdb/2017c/right/Asia/Famagusta
A testdata/tzdb/2017c/right/Asia/Gaza
A testdata/tzdb/2017c/right/Asia/Harbin
A testdata/tzdb/2017c/right/Asia/Hebron
A testdata/tzdb/2017c/right/Asia/Ho_Chi_Minh
A testdata/tzdb/2017c/right/Asia/Hong_Kong
A testdata/tzdb/2017c/right/Asia/Hovd
A testdata/tzdb/2017c/right/Asia/Irkutsk
A testdata/tzdb/2017c/right/Asia/Istanbul
A testdata/tzdb/2017c/right/Asia/Jakarta
A testdata/tzdb/2017c/right/Asia/Jayapura
A testdata/tzdb/2017c/right/Asia/Jerusalem
A testdata/tzdb/2017c/right/Asia/Kabul
A testdata/tzdb/2017c/right/Asia/Kamchatka
A testdata/tzdb/2017c/right/Asia/Karachi
A testdata/tzdb/2017c/right/Asia/Kashgar
A testdata/tzdb/2017c/right/Asia/Kathmandu
A testdata/tzdb/2017c/right/Asia/Katmandu
A testdata/tzdb/2017c/right/Asia/Khandyga
A testdata/tzdb/2017c/right/Asia/Kolkata
A testdata/tzdb/2017c/right/Asia/Krasnoyarsk
A testdata/tzdb/2017c/right/Asia/Kuala_Lumpur
A testdata/tzdb/2017c/right/Asia/Kuching
A testdata/tzdb/2017c/right/Asia/Kuwait
A testdata/tzdb/2017c/right/Asia/Macao
A testdata/tzdb/2017c/right/Asia/Macau
A testdata/tzdb/2017c/right/Asia/Magadan
A testdata/tzdb/2017c/right/Asia/Makassar
A testdata/tzdb/2017c/right/Asia/Manila
A testdata/tzdb/2017c/right/Asia/Muscat
A testdata/tzdb/2017c/right/Asia/Nicosia
A testdata/tzdb/2017c/right/Asia/Novokuznetsk
A testdata/tzdb/2017c/right/Asia/Novosibirsk
A testdata/tzdb/2017c/right/Asia/Omsk
A testdata/tzdb/2017c/right/Asia/Oral
A testdata/tzdb/2017c/right/Asia/Phnom_Penh
A testdata/tzdb/2017c/right/Asia/Pontianak
A testdata/tzdb/2017c/right/Asia/Pyongyang
A testdata/tzdb/2017c/right/Asia/Qatar
A testdata/tzdb/2017c/right/Asia/Qyzylorda
A testdata/tzdb/2017c/right/Asia/Rangoon
A testdata/tzdb/2017c/right/Asia/Riyadh
A testdata/tzdb/2017c/right/Asia/Saigon
A testdata/tzdb/2017c/right/Asia/Sakhalin
A testdata/tzdb/2017c/right/Asia/Samarkand
A testdata/tzdb/2017c/right/Asia/Seoul
A testdata/tzdb/2017c/right/Asia/Shanghai
A testdata/tzdb/2017c/right/Asia/Singapore
A testdata/tzdb/2017c/right/Asia/Srednekolymsk
A testdata/tzdb/2017c/right/Asia/Taipei
A testdata/tzdb/2017c/right/Asia/Tashkent
A testdata/tzdb/2017c/right/Asia/Tbilisi
A testdata/tzdb/2017c/right/Asia/Tehran
A testdata/tzdb/2017c/right/Asia/Tel_Aviv
A testdata/tzdb/2017c/right/Asia/Thimbu
A testdata/tzdb/2017c/right/Asia/Thimphu
A testdata/tzdb/2017c/right/Asia/Tokyo
A testdata/tzdb/2017c/right/Asia/Tomsk
A testdata/tzdb/2017c/right/Asia/Ujung_Pandang
A testdata/tzdb/2017c/right/Asia/Ulaanbaatar
A testdata/tzdb/2017c/right/Asia/Ulan_Bator
A testdata/tzdb/2017c/right/Asia/Urumqi
A testdata/tzdb/2017c/right/Asia/Ust-Nera
A testdata/tzdb/2017c/right/Asia/Vientiane
A testdata/tzdb/2017c/right/Asia/Vladivostok
A testdata/tzdb/2017c/right/Asia/Yakutsk
A testdata/tzdb/2017c/right/Asia/Yangon
A testdata/tzdb/2017c/right/Asia/Yekaterinburg
A testdata/tzdb/2017c/right/Asia/Yerevan
A testdata/tzdb/2017c/right/Atlantic/Azores
A testdata/tzdb/2017c/right/Atlantic/Bermuda
A testdata/tzdb/2017c/right/Atlantic/Canary
A testdata/tzdb/2017c/right/Atlantic/Cape_Verde
A testdata/tzdb/2017c/right/Atlantic/Faeroe
A testdata/tzdb/2017c/right/Atlantic/Faroe
A testdata/tzdb/2017c/right/Atlantic/Jan_Mayen
A testdata/tzdb/2017c/right/Atlantic/Madeira
A testdata/tzdb/2017c/right/Atlantic/Reykjavik
A testdata/tzdb/2017c/right/Atlantic/South_Georgia
A testdata/tzdb/2017c/right/Atlantic/St_Helena
A testdata/tzdb/2017c/right/Atlantic/Stanley
A testdata/tzdb/2017c/right/Australia/ACT
A testdata/tzdb/2017c/right/Australia/Adelaide
A testdata/tzdb/2017c/right/Australia/Brisbane
A testdata/tzdb/2017c/right/Australia/Broken_Hill
A testdata/tzdb/2017c/right/Australia/Canberra
A testdata/tzdb/2017c/right/Australia/Currie
A testdata/tzdb/2017c/right/Australia/Darwin
A testdata/tzdb/2017c/right/Australia/Eucla
A testdata/tzdb/2017c/right/Australia/Hobart
A testdata/tzdb/2017c/right/Australia/LHI
A testdata/tzdb/2017c/right/Australia/Lindeman
A testdata/tzdb/2017c/right/Australia/Lord_Howe
A testdata/tzdb/2017c/right/Australia/Melbourne
A testdata/tzdb/2017c/right/Australia/NSW
A testdata/tzdb/2017c/right/Australia/North
A testdata/tzdb/2017c/right/Australia/Perth
A testdata/tzdb/2017c/right/Australia/Queensland
A testdata/tzdb/2017c/right/Australia/South
A testdata/tzdb/2017c/right/Australia/Sydney
A testdata/tzdb/2017c/right/Australia/Tasmania
A testdata/tzdb/2017c/right/Australia/Victoria
A testdata/tzdb/2017c/right/Australia/West
A testdata/tzdb/2017c/right/Australia/Yancowinna
A testdata/tzdb/2017c/right/Brazil/Acre
A testdata/tzdb/2017c/right/Brazil/DeNoronha
A testdata/tzdb/2017c/right/Brazil/East
A testdata/tzdb/2017c/right/Brazil/West
A testdata/tzdb/2017c/right/CET
A testdata/tzdb/2017c/right/CST6CDT
A testdata/tzdb/2017c/right/Canada/Atlantic
A testdata/tzdb/2017c/right/Canada/Central
A testdata/tzdb/2017c/right/Canada/Eastern
A testdata/tzdb/2017c/right/Canada/Mountain
A testdata/tzdb/2017c/right/Canada/Newfoundland
A testdata/tzdb/2017c/right/Canada/Pacific
A testdata/tzdb/2017c/right/Canada/Saskatchewan
A testdata/tzdb/2017c/right/Canada/Yukon
A testdata/tzdb/2017c/right/Chile/Continental
A testdata/tzdb/2017c/right/Chile/EasterIsland
A testdata/tzdb/2017c/right/Cuba
A testdata/tzdb/2017c/right/EET
A testdata/tzdb/2017c/right/EST
A testdata/tzdb/2017c/right/EST5EDT
A testdata/tzdb/2017c/right/Egypt
A testdata/tzdb/2017c/right/Eire
A testdata/tzdb/2017c/right/Etc/GMT
A testdata/tzdb/2017c/right/Etc/GMT+0
A testdata/tzdb/2017c/right/Etc/GMT+1
A testdata/tzdb/2017c/right/Etc/GMT+10
A testdata/tzdb/2017c/right/Etc/GMT+11
A testdata/tzdb/2017c/right/Etc/GMT+12
A testdata/tzdb/2017c/right/Etc/GMT+2
A testdata/tzdb/2017c/right/Etc/GMT+3
A testdata/tzdb/2017c/right/Etc/GMT+4
A testdata/tzdb/2017c/right/Etc/GMT+5
A testdata/tzdb/2017c/right/Etc/GMT+6
A testdata/tzdb/2017c/right/Etc/GMT+7
A testdata/tzdb/2017c/right/Etc/GMT+8
A testdata/tzdb/2017c/right/Etc/GMT+9
A testdata/tzdb/2017c/right/Etc/GMT-0
A testdata/tzdb/2017c/right/Etc/GMT-1
A testdata/tzdb/2017c/right/Etc/GMT-10
A testdata/tzdb/2017c/right/Etc/GMT-11
A testdata/tzdb/2017c/right/Etc/GMT-12
A testdata/tzdb/2017c/right/Etc/GMT-13
A testdata/tzdb/2017c/right/Etc/GMT-14
A testdata/tzdb/2017c/right/Etc/GMT-2
A testdata/tzdb/2017c/right/Etc/GMT-3
A testdata/tzdb/2017c/right/Etc/GMT-4
A testdata/tzdb/2017c/right/Etc/GMT-5
A testdata/tzdb/2017c/right/Etc/GMT-6
A testdata/tzdb/2017c/right/Etc/GMT-7
A testdata/tzdb/2017c/right/Etc/GMT-8
A testdata/tzdb/2017c/right/Etc/GMT-9
A testdata/tzdb/2017c/right/Etc/GMT0
A testdata/tzdb/2017c/right/Etc/Greenwich
A testdata/tzdb/2017c/right/Etc/UCT
A testdata/tzdb/2017c/right/Etc/UTC
A testdata/tzdb/2017c/right/Etc/Universal
A testdata/tzdb/2017c/right/Etc/Zulu
A testdata/tzdb/2017c/right/Europe/Amsterdam
A testdata/tzdb/2017c/right/Europe/Andorra
A testdata/tzdb/2017c/right/Europe/Astrakhan
A testdata/tzdb/2017c/right/Europe/Athens
A testdata/tzdb/2017c/right/Europe/Belfast
A testdata/tzdb/2017c/right/Europe/Belgrade
A testdata/tzdb/2017c/right/Europe/Berlin
A testdata/tzdb/2017c/right/Europe/Bratislava
A testdata/tzdb/2017c/right/Europe/Brussels
A testdata/tzdb/2017c/right/Europe/Bucharest
A testdata/tzdb/2017c/right/Europe/Budapest
A testdata/tzdb/2017c/right/Europe/Busingen
A testdata/tzdb/2017c/right/Europe/Chisinau
A testdata/tzdb/2017c/right/Europe/Copenhagen
A testdata/tzdb/2017c/right/Europe/Dublin
A testdata/tzdb/2017c/right/Europe/Gibraltar
A testdata/tzdb/2017c/right/Europe/Guernsey
A testdata/tzdb/2017c/right/Europe/Helsinki
A testdata/tzdb/2017c/right/Europe/Isle_of_Man
A testdata/tzdb/2017c/right/Europe/Istanbul
A testdata/tzdb/2017c/right/Europe/Jersey
A testdata/tzdb/2017c/right/Europe/Kaliningrad
A testdata/tzdb/2017c/right/Europe/Kiev
A testdata/tzdb/2017c/right/Europe/Kirov
A testdata/tzdb/2017c/right/Europe/Lisbon
A testdata/tzdb/2017c/right/Europe/Ljubljana
A testdata/tzdb/2017c/right/Europe/London
A testdata/tzdb/2017c/right/Europe/Luxembourg
A testdata/tzdb/2017c/right/Europe/Madrid
A testdata/tzdb/2017c/right/Europe/Malta
A testdata/tzdb/2017c/right/Europe/Mariehamn
A testdata/tzdb/2017c/right/Europe/Minsk
A testdata/tzdb/2017c/right/Europe/Monaco
A testdata/tzdb/2017c/right/Europe/Moscow
A testdata/tzdb/2017c/right/Europe/Nicosia
A testdata/tzdb/2017c/right/Europe/Oslo
A testdata/tzdb/2017c/right/Europe/Paris
A testdata/tzdb/2017c/right/Europe/Podgorica
A testdata/tzdb/2017c/right/Europe/Prague
A testdata/tzdb/2017c/right/Europe/Riga
A testdata/tzdb/2017c/right/Europe/Rome
A testdata/tzdb/2017c/right/Europe/Samara
A testdata/tzdb/2017c/right/Europe/San_Marino
A testdata/tzdb/2017c/right/Europe/Sarajevo
A testdata/tzdb/2017c/right/Europe/Saratov
A testdata/tzdb/2017c/right/Europe/Simferopol
A testdata/tzdb/2017c/right/Europe/Skopje
A testdata/tzdb/2017c/right/Europe/Sofia
A testdata/tzdb/2017c/right/Europe/Stockholm
A testdata/tzdb/2017c/right/Europe/Tallinn
A testdata/tzdb/2017c/right/Europe/Tirane
A testdata/tzdb/2017c/right/Europe/Tiraspol
A testdata/tzdb/2017c/right/Europe/Ulyanovsk
A testdata/tzdb/2017c/right/Europe/Uzhgorod
A testdata/tzdb/2017c/right/Europe/Vaduz
A testdata/tzdb/2017c/right/Europe/Vatican
A testdata/tzdb/2017c/right/Europe/Vienna
A testdata/tzdb/2017c/right/Europe/Vilnius
A testdata/tzdb/2017c/right/Europe/Volgograd
A testdata/tzdb/2017c/right/Europe/Warsaw
A testdata/tzdb/2017c/right/Europe/Zagreb
A testdata/tzdb/2017c/right/Europe/Zaporozhye
A testdata/tzdb/2017c/right/Europe/Zurich
A testdata/tzdb/2017c/right/Factory
A testdata/tzdb/2017c/right/GB
A testdata/tzdb/2017c/right/GB-Eire
A testdata/tzdb/2017c/right/GMT
A testdata/tzdb/2017c/right/GMT+0
A testdata/tzdb/2017c/right/GMT-0
A testdata/tzdb/2017c/right/GMT0
A testdata/tzdb/2017c/right/Greenwich
A testdata/tzdb/2017c/right/HST
A testdata/tzdb/2017c/right/Hongkong
A testdata/tzdb/2017c/right/Iceland
A testdata/tzdb/2017c/right/Indian/Antananarivo
A testdata/tzdb/2017c/right/Indian/Chagos
A testdata/tzdb/2017c/right/Indian/Christmas
A testdata/tzdb/2017c/right/Indian/Cocos
A testdata/tzdb/2017c/right/Indian/Comoro
A testdata/tzdb/2017c/right/Indian/Kerguelen
A testdata/tzdb/2017c/right/Indian/Mahe
A testdata/tzdb/2017c/right/Indian/Maldives
A testdata/tzdb/2017c/right/Indian/Mauritius
A testdata/tzdb/2017c/right/Indian/Mayotte
A testdata/tzdb/2017c/right/Indian/Reunion
A testdata/tzdb/2017c/right/Iran
A testdata/tzdb/2017c/right/Israel
A testdata/tzdb/2017c/right/Jamaica
A testdata/tzdb/2017c/right/Japan
A testdata/tzdb/2017c/right/Kwajalein
A testdata/tzdb/2017c/right/Libya
A testdata/tzdb/2017c/right/MET
A testdata/tzdb/2017c/right/MST
A testdata/tzdb/2017c/right/MST7MDT
A testdata/tzdb/2017c/right/Mexico/BajaNorte
A testdata/tzdb/2017c/right/Mexico/BajaSur
A testdata/tzdb/2017c/right/Mexico/General
A testdata/tzdb/2017c/right/NZ
A testdata/tzdb/2017c/right/NZ-CHAT
A testdata/tzdb/2017c/right/Navajo
A testdata/tzdb/2017c/right/PRC
A testdata/tzdb/2017c/right/PST8PDT
A testdata/tzdb/2017c/right/Pacific/Apia
A testdata/tzdb/2017c/right/Pacific/Auckland
A testdata/tzdb/2017c/right/Pacific/Bougainville
A testdata/tzdb/2017c/right/Pacific/Chatham
A testdata/tzdb/2017c/right/Pacific/Chuuk
A testdata/tzdb/2017c/right/Pacific/Easter
A testdata/tzdb/2017c/right/Pacific/Efate
A testdata/tzdb/2017c/right/Pacific/Enderbury
A testdata/tzdb/2017c/right/Pacific/Fakaofo
A testdata/tzdb/2017c/right/Pacific/Fiji
A testdata/tzdb/2017c/right/Pacific/Funafuti
A testdata/tzdb/2017c/right/Pacific/Galapagos
A testdata/tzdb/2017c/right/Pacific/Gambier
A testdata/tzdb/2017c/right/Pacific/Guadalcanal
A testdata/tzdb/2017c/right/Pacific/Guam
A testdata/tzdb/2017c/right/Pacific/Honolulu
A testdata/tzdb/2017c/right/Pacific/Johnston
A testdata/tzdb/2017c/right/Pacific/Kiritimati
A testdata/tzdb/2017c/right/Pacific/Kosrae
A testdata/tzdb/2017c/right/Pacific/Kwajalein
A testdata/tzdb/2017c/right/Pacific/Majuro
A testdata/tzdb/2017c/right/Pacific/Marquesas
A testdata/tzdb/2017c/right/Pacific/Midway
A testdata/tzdb/2017c/right/Pacific/Nauru
A testdata/tzdb/2017c/right/Pacific/Niue
A testdata/tzdb/2017c/right/Pacific/Norfolk
A testdata/tzdb/2017c/right/Pacific/Noumea
A testdata/tzdb/2017c/right/Pacific/Pago_Pago
A testdata/tzdb/2017c/right/Pacific/Palau
A testdata/tzdb/2017c/right/Pacific/Pitcairn
A testdata/tzdb/2017c/right/Pacific/Pohnpei
A testdata/tzdb/2017c/right/Pacific/Ponape
A testdata/tzdb/2017c/right/Pacific/Port_Moresby
A testdata/tzdb/2017c/right/Pacific/Rarotonga
A testdata/tzdb/2017c/right/Pacific/Saipan
A testdata/tzdb/2017c/right/Pacific/Samoa
A testdata/tzdb/2017c/right/Pacific/Tahiti
A testdata/tzdb/2017c/right/Pacific/Tarawa
A testdata/tzdb/2017c/right/Pacific/Tongatapu
A testdata/tzdb/2017c/right/Pacific/Truk
A testdata/tzdb/2017c/right/Pacific/Wake
A testdata/tzdb/2017c/right/Pacific/Wallis
A testdata/tzdb/2017c/right/Pacific/Yap
A testdata/tzdb/2017c/right/Poland
A testdata/tzdb/2017c/right/Portugal
A testdata/tzdb/2017c/right/ROC
A testdata/tzdb/2017c/right/ROK
A testdata/tzdb/2017c/right/Singapore
A testdata/tzdb/2017c/right/SystemV/AST4
A testdata/tzdb/2017c/right/SystemV/AST4ADT
A testdata/tzdb/2017c/right/SystemV/CST6
A testdata/tzdb/2017c/right/SystemV/CST6CDT
A testdata/tzdb/2017c/right/SystemV/EST5
A testdata/tzdb/2017c/right/SystemV/EST5EDT
A testdata/tzdb/2017c/right/SystemV/HST10
A testdata/tzdb/2017c/right/SystemV/MST7
A testdata/tzdb/2017c/right/SystemV/MST7MDT
A testdata/tzdb/2017c/right/SystemV/PST8
A testdata/tzdb/2017c/right/SystemV/PST8PDT
A testdata/tzdb/2017c/right/SystemV/YST9
A testdata/tzdb/2017c/right/SystemV/YST9YDT
A testdata/tzdb/2017c/right/Turkey
A testdata/tzdb/2017c/right/UCT
A testdata/tzdb/2017c/right/US/Alaska
A testdata/tzdb/2017c/right/US/Aleutian
A testdata/tzdb/2017c/right/US/Arizona
A testdata/tzdb/2017c/right/US/Central
A testdata/tzdb/2017c/right/US/East-Indiana
A testdata/tzdb/2017c/right/US/Eastern
A testdata/tzdb/2017c/right/US/Hawaii
A testdata/tzdb/2017c/right/US/Indiana-Starke
A testdata/tzdb/2017c/right/US/Michigan
A testdata/tzdb/2017c/right/US/Mountain
A testdata/tzdb/2017c/right/US/Pacific
A testdata/tzdb/2017c/right/US/Pacific-New
A testdata/tzdb/2017c/right/US/Samoa
A testdata/tzdb/2017c/right/UTC
A testdata/tzdb/2017c/right/Universal
A testdata/tzdb/2017c/right/W-SU
A testdata/tzdb/2017c/right/WET
A testdata/tzdb/2017c/right/Zulu
A testdata/tzdb/2017c/zone.tab
A testdata/tzdb/abbrev.conf
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
A tests/custom_cluster/test_shared_tzdb.py
D tests/query_test/test_timezones.py
1,874 files changed, 4,689 insertions(+), 1,144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/9
-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 9
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................

IMPALA-3307: Add support for IANA time-zone db

Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.
- Introduces a new startup flag (--hdfs_zone_info_zip) to impalad to
  specify an HDFS/S3/ADLS path to a zip archive that contains the
  shared compiled IANA time-zone database. If the startup flag is set,
  impalad will use the specified time-zone database. Otherwise,
  impalad will use the default /usr/share/zoneinfo time-zone database.
- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.
- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.
- Introduces a new startup flag (--hdfs_zone_abbrev_conf) to impalad
  to specify an HDFS/S3/ADLS path to a shared config file that
  contains definitions for non-standard time-zone abbreviations.

Cherry-picks: not for 2.x.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/common/init.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
A be/src/exprs/timezone_db-test.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/frontend.cc
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/util/CMakeLists.txt
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/hdfs-util-test.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M be/src/util/time-test.cc
M be/src/util/time.cc
M be/src/util/time.h
A be/src/util/zip-util.cc
A be/src/util/zip-util.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/rat_exclude_files.txt
A cmake_modules/FindCctz.cmake
M common/thrift/CMakeLists.txt
M common/thrift/ImpalaInternalService.thrift
A common/thrift/Zip.thrift
M common/thrift/metrics.json
A fe/src/main/java/org/apache/impala/util/ZipUtil.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/bin/create-load-data.sh
M testdata/data/timezoneverification.csv
A testdata/tzdb/2017c.zip
A testdata/tzdb/abbrev.conf
A testdata/tzdb_tiny/America/New_York
A testdata/tzdb_tiny/Etc/GMT+4
A testdata/tzdb_tiny/US/Eastern
A testdata/tzdb_tiny/UTC
A testdata/tzdb_tiny/Zulu
A testdata/tzdb_tiny/posix/UTC
A testdata/tzdb_tiny/posixrules
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
A tests/custom_cluster/test_shared_tzdb.py
D tests/query_test/test_timezones.py
70 files changed, 2,994 insertions(+), 1,167 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/12
-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 12
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 2:

(55 comments)

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc
File be/src/benchmarks/convert-timestamp-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc@150
PS2, Line 150: void AddTestDataDateTimes(vector<TimestampValue>& data, int n, const string& startstr) {
> Since it's in a benchmark it doesn't really matter, anyway, some nit commen
Done


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc@172
PS2, Line 172: TestData
> This looks like a fairly general class to me that could move to util/benchm
'measure_multithreaded_elapsed_time' function is not that general, it is used here as a quick and dirty way to verify that glibc calls are executed in a serial fashion even in a multithreaded environment.  Because of that I would like to keep this class here.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc@174
PS2, Line 174: > >
> Nit: since C++11 you don't need to put spaces between right angle brackets.
Done


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc@228
PS2, Line 228: const shared_ptr<vector<FROM> > data_
> Nit: I think using 'const vector<FROM>&' would be simpler. I don't really s
True, I rewrote this portion of the code so many times that I lost track of where I was going with it :)


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc@319
PS2, Line 319:     boost_throw_if_date_out_of_range(local_time.date());
> Is this function call really needed here? Don't we trust boost that it vali
This is how the function was implemented originally. Since the point of this benchmark program is to compare the old implementation with the new one, I figured I shouldn't change the old code.

I think the call is necessary, because boost might not validate the date range until you call the gregorian::date accessors. It is confusing.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc@444
PS2, Line 444:     time_t utc =
> We could replace this with boost_utc_to_unix_time. This conversion should b
This is how UtcToLocal was implemented originally. The goal of this benchmark program is to compare the original implementation with the new one (including the glue code).


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc@530
PS2, Line 530: //
             : // Test UnixTimeToUtcPtime (boost is expected to be faster than CCTZ)
             : //
             : 
             : // boost
             : boost::pos
> I think that this a bit misleading, as boost_unix_time_to_utc_ptime never u
Done


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc@608
PS2, Line 608:   if (cctz_utc_to_unix_data.get_result() != glibc_utc_to_unix_data.get_result()) {
             :     cerr << "cctz/glibc utc_to_unix results do not match!" << endl;
             :     return 1;
             :   }
             :   if (boost_utc_to_unix_data.get_result() != glibc_utc_to_unix_data.get_result()) {
             :     cerr << "boost/glibc utc_to_unix results do not match!" << endl;
             :     return 1;
             :   }
> The other benchmarks don't need this validity check?
Done (although passing a vector of TestData to the helper function was not feasible as the different TestData classes are instantiated with a different converter functions, therefore they are not "compatible")


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exec/data-source-scan-node.h
File be/src/exec/data-source-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exec/data-source-scan-node.h@100
PS1, Line 100:   Status MaterializeNextRow(RuntimeState* state, MemPool* mem_pool, Tuple* tuple);
> What do you think about passing cctz::time_zone* instead of RuntimeState*?
Done


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/decimal-operators.h
File be/src/exprs/decimal-operators.h:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/decimal-operators.h@168
PS2, Line 168:   /// local time in 'local_tz' time-zone). Rounds instead of truncating if 'round' is true.
> nit: long line
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/expr-test.cc@6398
PS1, Line 6398: const char* local_tz_name = "PST8PDT";
              :     ScopedTimeZoneOverride time_zone(local_tz_name);
              :     const cctz::time_zone* local_tz = TimezoneDatabase::FindTimezone(local_tz_name);
              :     DCHECK(local_tz != nullptr);
> Have you considered moving this to a function or macro or such? As I see yo
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@63
PS1, Line 63:   if (UNLIKELY(timezone == nullptr)) {
> This could be UNLIKELY.
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@73
PS1, Line 73: from_cs
> Might be just me but I don't really find the names from_cs, to_cs and from_
Simplified the logic here a bit, so now we only have 'from_tp' and 'to_cs' . I also specified the concrete types instead of just using 'auto'. Hope it makes the code easier to understand.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@76
PS1, Line 76: auto from_tp = cctz::convert(from_cs, TimezoneDatabase::GetUtcTimezone());
            :   auto to_cs = cctz::convert(from_tp, *timezone);
> I think it would worth writing a comment why the two cctz::convert() calls 
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@79
PS1, Line 79:   // Check if resulting timestamp is within range
> In my opinion this comment doesn't add extra value as the name of the funct
Removed it.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@113
PS1, Line 113:   context->AddWarning(ss.str().c_str());
             :     return ts_val;
             :   }
             : 
> this seems duplicate code (TimestampFunctions::FromUtc)
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@140
PS1, Line 140:   context->AddWarning(msg.c_str());
             :     return TimestampVal::null();
             :   }
             : 
             :   // Create 'return_date' and 'return_time' from 'to_cs'.
> I think this conversion could go to a function as it seems duplicate for me
Done


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timestamp-functions.cc@71
PS2, Line 71:   const boost::gregorian::date& d = ts_value.date();
            :   const boost::posix_time::time_duration& t = ts_value.time();
            :   const cctz::civil_second from_cs(d.year(), d.month(), d.day(), t.hours(), t.minutes(),
            :       t.seconds());
            : 
            :   auto from_tp = cctz::convert(from_cs, TimezoneDatabase::GetUtcTimezone());
> Does cctz offer a function to create timepoint from unix time_t? If yes, th
Done


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timestamp-functions.cc@122
PS2, Line 122: or
> Nit: of
Done


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.h
File be/src/exprs/timezone_db.h:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.h@21
PS2, Line 21: boost
> I think we should use the unordered_map class from the C++ STL.
Impala uses both implementations. I've switched to using std::unordered_map here.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.h@31
PS2, Line 31: class TimezoneDatabase {
> My general feeling about this class is that a bit more transparency would a
I've added more details.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.h@56
PS2, Line 56: TZ_MAP
> TimezoneMap? I don't think a typename should be all capitals.
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h
File be/src/exprs/timezone_db.h:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@21
PS1, Line 21: #include <boost/unordered_map.hpp>
> shouldn't we use the one from std?
Both implementations are used in Impala. Switched to using std::unordered_map here.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@35
PS1, Line 35: 
> string& GetPath() ?
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@47
PS1, Line 47:   }
> FindTimezone() returns pointer while GetUtcTimezone() returns reference to 
GetUtcTimezone() is guaranteed to succeed, whereas FindTimezone() doesn't and it has to signal failure somehow. It could return bool instead, but I don't think that would help matters as the time_zone out parameter would still have to be a pointer. Maybe we can name it 'FindTimezonePtr' instead?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@54
PS1, Line 54:   static const cctz::time_zone UTC_TIMEZONE_;
> Could you add a comment what the string param is used for?
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@62
PS1, Line 62:   /// location.
> As I see you wrote comments for these function in the .cc file. Could you m
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@10
PS1, Line 10: //
> Unrelated to your change but shouldn't we replace this to the Apache header
What do you mean? Checked some random files under exprs directory and they all appear to use the same header.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@103
PS1, Line 103: // Returns 'true' if path 'a' starts with path 'b'. If 'relative' is not nullptr, it will
> FileSystemUtil would be a better place for this, I think.
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@105
PS1, Line 105: PathStartsWith
> I have the feeling that this function should only decide if path 'b' is the
Implemented FileSystemUtil::IsPrefixPath() and FileSystemUtil::GetRelativePath() functions instead to make the functionality and usage straightforward.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@105
PS1, Line 105: string *relative
> For me the name of this variable doesn't indicate it's purpose. Can you nam
Moved the functionality to FileSystemUtil::GetRelativePath() and fixed the comment.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@108
PS1, Line 108: b.length() + 1 < a.length()
> what if a==b. In theory then a starts with b, still this returns false.
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@118
PS1, Line 118: // with an uppercase letter.
> Could you mention examples here that contain the mentioned allowed chars?
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@119
PS1, Line 119: bool IsTimezoneNameSegmentValid(const string& tz_seg) {
> Shouldn't this be part of TimezoneDatabase or some other timezone related h
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@121
PS1, Line 121:       find_if(
> Tricky :)
Changed the function to use regex.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@132
PS1, Line 132: // time-zone name segments delimited by '/'.
> Could you mention one input example in the comment?
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@133
PS1, Line 133: bool IsTimezoneNameValid(const string& tz_name) {
> Shouldn't this be part of TimezoneDatabase or some other timezone related h
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@136
PS1, Line 136:   while (end != string::npos) {
> Wouldn't it be easier to verify this with a regex and get rid of this while
Changed the function to use regex. We still need IsTimezoneNameSegmentValid() though.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@147
PS1, Line 147: bool IsTimezoneOffsetValid(const string& tz_offset, int64_t* offset_sec) {
> Shouldn't this be part of TimezoneDatabase or some other timezone related h
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@147
PS1, Line 147: bool
> can you return int64_t* and return nullptr in case the offset is not valid?
Returning a pointer just to be able to signal failure with nullptr seems contrived. I think this interface is cleaner.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@160
PS1, Line 160: // The implementation here was adapted from
> I again feel here that this function serves 2 purposes instead of a clear o
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@251
PS1, Line 251:   // mkdtemp operates in place, so we need a mutable array.
> Can you move the comment to the header?
Done


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.cc@43
PS2, Line 43: DEFINE_string(hdfs_zoneinfo_dir, "",
            :     "HDFS/S3A/ADLS path to load IANA time-zone database from.");
            : DEFINE_string(hdfs_zoneabbrev_config, "",
            :     "HDFS/S3A/ADLS path to config file defining non-standard time-zone abbreviations.");
> It will be a bit tricky, but these should be tested somehow. Playing with t
Added an e2e test: tests/custom_cluster/custom_tzdb.py


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.cc@419
PS2, Line 419: Status TimezoneDatabase::LoadZoneAbbreviations(istream &is,
> At least basic testing should be added to check that fix and non-fix abbrev
Done (see timezone_db-test.cc)


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.cc@420
PS2, Line 420: /* = nullptr */
> drop this comment
Done


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.cc@495
PS2, Line 495: ZONEINFO_DIR
> nit: ZONE_INFO_DIR?
Done


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/runtime/timestamp-value.h@101
PS2, Line 101:   static TimestampValue FromUnixTime(time_t unix_time, const cctz::time_zone* local_tz) {
> Do you think that mentioning the new param for these functions would add ex
I've fixed the comments here and below.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/runtime/timestamp-value.cc@135
PS2, Line 135:   auto from_tp = FromUnixSeconds(unix_time);
             :   auto to_cs = cctz::convert(from_tp, *local_tz
> This would be a big change, but I would think about "auto" types - do they 
Done


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/runtime/timestamp-value.inline.h
File be/src/runtime/timestamp-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/runtime/timestamp-value.inline.h@98
PS2, Line 98:   if (UNLIKELY(!HasDateAndTime())) return false;
            :   cctz::civil_second cs(date_.year(), date_.month(), date_.day(), time_.hours(),
            :       time_.minutes(), time_.seconds());
            :   auto tp = cctz::convert(cs,
            :       FLAGS_use_local_tz_for_unix_timestamp_conversions ? *local_tz :
            :       TimezoneDatabase::GetUtcTimezone());
> If FLAGS_use_local_tz_for_unix_timestamp_conversions is false, then we coul
Done


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/util/filesystem-util.h
File be/src/util/filesystem-util.h:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/util/filesystem-util.h@57
PS2, Line 57:   static Status GetRealPath(
> Shouldn't you mentioned that this should be called on sym links? (or do I m
No, it's not just for symlinks. Fixed the comment about 'real_path'.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/util/filesystem-util.h@60
PS2, Line 60: Is it is
> nit: if it is
Done


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/util/hdfs-util.h
File be/src/util/hdfs-util.h:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/util/hdfs-util.h@59
PS2, Line 59: /// Returns basename of 'path'.
> Could you add an example to the comment?
Done


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/util/time.h
File be/src/util/time.h:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/util/time.h@123
PS2, Line 123: /// Converts input microseconds-since-epoch to date-time string in 'tz' time zone.
> Could you mention 'p' as well?
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/util/time.h
File be/src/util/time.h:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/util/time.h@30
PS1, Line 30: 
> What do you think add about adding a typedef for cctz::timezone? I think th
Done (see be/src/common/global-types.h)


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/util/time.cc
File be/src/util/time.cc:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/util/time.cc@167
PS2, Line 167:   const char* fmt = (p == TimePrecision::Millisecond) ? fmt_millisec :
> For me this 3 layers of embedded ternary operators isn't that readable. Wha
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 03 May 2018 17:57:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 2:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc
File be/src/benchmarks/convert-timestamp-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc@172
PS2, Line 172: TestData
This looks like a fairly general class to me that could move to util/benchmark.h or a similar file. We can create a follow up ticket if you don't want to deal with this now.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc@444
PS2, Line 444:     time_t utc =
We could replace this with boost_utc_to_unix_time. This conversion should be fast, as we want to measure the speed of localtime_r, not the "glue" code.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc@530
PS2, Line 530: //
             : // Test UnixTimeToUtcPtime (boost is expected to be faster than CCTZ)
             : //
             : 
             : // boost
             : boost::pos
I think that this a bit misleading, as boost_unix_time_to_utc_ptime never uses its slow branch with this test data. We could measure gmtime_r separately, and add some comments that tells which functions were used in Impala before/after the change.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/decimal-operators.h
File be/src/exprs/decimal-operators.h:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/decimal-operators.h@168
PS2, Line 168:   /// local time in 'local_tz' time-zone). Rounds instead of truncating if 'round' is true.
nit: long line


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timestamp-functions.cc@71
PS2, Line 71:   const boost::gregorian::date& d = ts_value.date();
            :   const boost::posix_time::time_duration& t = ts_value.time();
            :   const cctz::civil_second from_cs(d.year(), d.month(), d.day(), t.hours(), t.minutes(),
            :       t.seconds());
            : 
            :   auto from_tp = cctz::convert(from_cs, TimezoneDatabase::GetUtcTimezone());
Does cctz offer a function to create timepoint from unix time_t? If yes, then it should be faster to convert ts_val to a time_t, and convert that to from_tp.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.cc@43
PS2, Line 43: DEFINE_string(hdfs_zoneinfo_dir, "",
            :     "HDFS/S3A/ADLS path to load IANA time-zone database from.");
            : DEFINE_string(hdfs_zoneabbrev_config, "",
            :     "HDFS/S3A/ADLS path to config file defining non-standard time-zone abbreviations.");
It will be a bit tricky, but these should be tested somehow. Playing with these configurations could go to a custom cluster test. These tests can be probably created at a later stage of the review, after the architecture have benn accepted.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.cc@419
PS2, Line 419: Status TimezoneDatabase::LoadZoneAbbreviations(istream &is,
At least basic testing should be added to check that fix and non-fix abbreviations are parsed correctly, but it would be the best to also create tests for error messages.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/runtime/timestamp-value.cc@135
PS2, Line 135:   auto from_tp = FromUnixSeconds(unix_time);
             :   auto to_cs = cctz::convert(from_tp, *local_tz
This would be a big change, but I would think about "auto" types - do they make the code more readable, or not? I think that if type name is long and the reader more-or-less knows what kind of type to expect, then "auto" is very nice. But if the reader does not know context/library well, then "auto" makes it harder to look-up the classes. If you want to keep them, then I would prefer longer variable names, like spelling out civil_seconds.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/runtime/timestamp-value.inline.h
File be/src/runtime/timestamp-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/runtime/timestamp-value.inline.h@98
PS2, Line 98:   if (UNLIKELY(!HasDateAndTime())) return false;
            :   cctz::civil_second cs(date_.year(), date_.month(), date_.day(), time_.hours(),
            :       time_.minutes(), time_.seconds());
            :   auto tp = cctz::convert(cs,
            :       FLAGS_use_local_tz_for_unix_timestamp_conversions ? *local_tz :
            :       TimezoneDatabase::GetUtcTimezone());
If FLAGS_use_local_tz_for_unix_timestamp_conversions is false, then we could call UtcToUnixTime.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/util/time-test.cc
File be/src/util/time-test.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/util/time-test.cc@60
PS1, Line 60:   EXPECT_EQ("1677-09-21 00:12:43.146",
            :       ToUtcStringFromUnixMillis(INT64_MIN / NANOS_PER_MICRO / MICROS_PER_MILLI));
            :   EXPECT_EQ("1677-09-21 00:12:43.145225",
            :       ToUtcStringFromUnixMicros(INT64_MIN / NANOS_PER_MICRO));
> Actually, the old expected values were incorrect due to a bug in time.cc. E
Thanks for the explanation!


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/util/time.h
File be/src/util/time.h:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/util/time.h@30
PS1, Line 30: 
What do you think add about adding a typedef for cctz::timezone? I think that functions that just pass the cctz::time_zone* through do not have to know that we are using cctz. This would make it easier if we would like to wrap or replace it in the future.


http://gerrit.cloudera.org:8080/#/c/9986/1/fe/src/test/java/org/apache/impala/testutil/TestUtils.java
File fe/src/test/java/org/apache/impala/testutil/TestUtils.java:

http://gerrit.cloudera.org:8080/#/c/9986/1/fe/src/test/java/org/apache/impala/testutil/TestUtils.java@267
PS1, Line 267:     queryCtx.setLocal_time_zone("PST8PDT");
> This commit adds 'local_time_zone' field to the query context (ImpalaIntern
Thanks for the explanation!



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Apr 2018 22:08:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 20: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2721/


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 20
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Jun 2018 16:05:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................

IMPALA-3307: Add support for IANA time-zone db

Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.

- Introduces a new startup flag (--hdfs_zone_info_zip) to impalad to
  specify an HDFS/S3/ADLS path to a zip archive that contains the
  shared compiled IANA time-zone database. If the startup flag is set,
  impalad will use the specified time-zone database. Otherwise,
  impalad will use the default /usr/share/zoneinfo time-zone database.

- Introduces a new startup flag (--hdfs_zone_abbrev_conf) to impalad
  to specify an HDFS/S3/ADLS path to a shared config file that
  contains definitions for non-standard time-zone abbreviations.

- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.

- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.

- Adds a new ZipUtil class to extract files from a zip archive. The
  implementation is not vulnerable to Zip Slip.

Cherry-picks: not for 2.x.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/common/init.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
A be/src/exprs/timezone_db-test.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/frontend.cc
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/util/CMakeLists.txt
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/hdfs-util-test.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M be/src/util/time-test.cc
M be/src/util/time.cc
M be/src/util/time.h
A be/src/util/zip-util-test.cc
A be/src/util/zip-util.cc
A be/src/util/zip-util.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/rat_exclude_files.txt
A cmake_modules/FindCctz.cmake
M common/thrift/CMakeLists.txt
M common/thrift/ImpalaInternalService.thrift
A common/thrift/Zip.thrift
M common/thrift/metrics.json
A fe/src/main/java/org/apache/impala/util/ZipUtil.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/bin/create-load-data.sh
M testdata/data/timezoneverification.csv
A testdata/tzdb/2017c-corrupt.zip
A testdata/tzdb/2017c.zip
A testdata/tzdb/alias.conf
A testdata/tzdb_tiny/America/New_York
A testdata/tzdb_tiny/Etc/GMT+4
A testdata/tzdb_tiny/US/Eastern
A testdata/tzdb_tiny/UTC
A testdata/tzdb_tiny/Zulu
A testdata/tzdb_tiny/posix/UTC
A testdata/tzdb_tiny/posixrules
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
A tests/custom_cluster/test_shared_tzdb.py
D tests/query_test/test_timezones.py
72 files changed, 3,089 insertions(+), 1,167 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/16
-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 16
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc
File be/src/benchmarks/convert-timestamp-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc@150
PS2, Line 150: void AddTestDataDateTimes(vector<TimestampValue>& data, int n, const string& startstr) {
Since it's in a benchmark it doesn't really matter, anyway, some nit comments:
- output parameters should be listed last and they should be pointers
- in this particular case I think it would be better to return vector<TimestampValue>


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc@174
PS2, Line 174: > >
Nit: since C++11 you don't need to put spaces between right angle brackets.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc@228
PS2, Line 228: const shared_ptr<vector<FROM> > data_
Nit: I think using 'const vector<FROM>&' would be simpler. I don't really see why we need shared_ptr here.

Also, in 'const shared_ptr<T>' only the pointer is const, the pointed object is not.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc@319
PS2, Line 319:     boost_throw_if_date_out_of_range(local_time.date());
Is this function call really needed here? Don't we trust boost that it validates the date correctly?


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc@608
PS2, Line 608:   if (cctz_utc_to_unix_data.get_result() != glibc_utc_to_unix_data.get_result()) {
             :     cerr << "cctz/glibc utc_to_unix results do not match!" << endl;
             :     return 1;
             :   }
             :   if (boost_utc_to_unix_data.get_result() != glibc_utc_to_unix_data.get_result()) {
             :     cerr << "boost/glibc utc_to_unix results do not match!" << endl;
             :     return 1;
             :   }
The other benchmarks don't need this validity check?

It could be implemented in a helper function that takes a vector of TestData.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timestamp-functions.cc@122
PS2, Line 122: or
Nit: of


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.h
File be/src/exprs/timezone_db.h:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.h@21
PS2, Line 21: boost
I think we should use the unordered_map class from the C++ STL.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.h@56
PS2, Line 56: TZ_MAP
TimezoneMap? I don't think a typename should be all capitals.



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 19 Apr 2018 14:20:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 7: Code-Review+1

(4 comments)

Thanks Attila for addressing my comments! I'm fine with the change.

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exprs/timezone_db-test.cc
File be/src/exprs/timezone_db-test.cc:

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exprs/timezone_db-test.cc@119
PS7, Line 119:   TestInvalidTimezoneAbbrevName("pST");
Four of these are already tested by TimezoneDbNamesTest. No need to test the here as well, I think.


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db.h
File be/src/exprs/timezone_db.h:

http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db.h@1
PS4, Line 1: // Licensed to the Apache Software Foundation (ASF) under one
> Not sure what happened there. I probably inadvertently executed some myster
:D


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db.cc@365
PS4, Line 365:   RETURN_IF_ERROR(
> LoadZoneInfoFromHdfs() uses cctz::load_time_zone() to load time-zone files 
Thx!


http://gerrit.cloudera.org:8080/#/c/9986/5/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

http://gerrit.cloudera.org:8080/#/c/9986/5/be/src/runtime/runtime-state.cc@136
PS5, Line 136:     local_time_zone_ = &TimezoneDatabase::GetUtcTimezone();
> True, but I wanted to be more explicit here.
thanks for the explanation.



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 7
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 16 May 2018 14:43:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 10:

> > > > > Uploaded patch set 9.
 > > > >
 > > > > Patch -set 9 contains the following changes:
 > > > > - Added a full timezone db to testdata/tzdb.
 > > > > - End-to-end tests and BE-tests were changed to use this
 > > timezone
 > > > > db. This was necessary because some timezone-tests were
 > failing
 > > > on
 > > > > older jenkins workers that had an older tzdata package
 > > installed.
 > > >
 > > > It might be a good idea to store the timezone-db files in one
 > > .tar
 > > > file and extract them before running the tests. What do you
 > > think?
 > >
 > > I agree, .taring or compressing the tz db would be much better,
 > if
 > > it does not make the code too complicated. Having less file would
 > > make the review more readable, and would also make the tz db
 > > consume much less space on hdfs, as the many small files will be
 > > rounded up to hdfs block size.
 > 
 > Extracting files from a .tar file can be tricky. Probably we would
 > have to add libtar library to the native-toolchain to handle .tar
 > files.
 > 
 > Alternatively we can store timezone files in a JAR archive instead.
 > The BE can call into the java FE to extract files from it.

Tim, Dan, what do you think?


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 30 May 2018 15:34:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................

IMPALA-3307: Add support for IANA time-zone db

Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.

- Introduces a new startup flag (--hdfs_zone_info_zip) to impalad to
  specify an HDFS/S3/ADLS path to a zip archive that contains the
  shared compiled IANA time-zone database. If the startup flag is set,
  impalad will use the specified time-zone database. Otherwise,
  impalad will use the default /usr/share/zoneinfo time-zone database.

- Introduces a new startup flag (--hdfs_zone_alias_conf) to impalad to
  specify an HDFS/S3/ADLS path to a shared config file that contains
  definitions for non-standard time-zone aliases.

- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.

- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.

- Adds a new ZipUtil class to extract files from a zip archive. The
  implementation is not vulnerable to Zip Slip.

Cherry-picks: not for 2.x.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/common/init.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
A be/src/exprs/timezone_db-test.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/frontend.cc
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/util/CMakeLists.txt
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/hdfs-util-test.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M be/src/util/time-test.cc
M be/src/util/time.cc
M be/src/util/time.h
A be/src/util/zip-util-test.cc
A be/src/util/zip-util.cc
A be/src/util/zip-util.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/rat_exclude_files.txt
A cmake_modules/FindCctz.cmake
M common/thrift/CMakeLists.txt
M common/thrift/ImpalaInternalService.thrift
A common/thrift/Zip.thrift
M common/thrift/metrics.json
A fe/src/main/java/org/apache/impala/util/ZipUtil.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/bin/create-load-data.sh
M testdata/data/timezoneverification.csv
A testdata/tzdb/2017c-corrupt.zip
A testdata/tzdb/2017c.zip
A testdata/tzdb/alias.conf
A testdata/tzdb_tiny/America/New_York
A testdata/tzdb_tiny/Etc/GMT+4
A testdata/tzdb_tiny/US/Eastern
A testdata/tzdb_tiny/UTC
A testdata/tzdb_tiny/Zulu
A testdata/tzdb_tiny/posix/UTC
A testdata/tzdb_tiny/posixrules
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
A tests/custom_cluster/test_shared_tzdb.py
D tests/query_test/test_timezones.py
72 files changed, 3,087 insertions(+), 1,167 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/20
-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 20
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 7:

(2 comments)

I focused mostly on the non-timestamp/timezone/time and test/infra parts. It looks fine to me. Would be good to get Gabor's to finish his review to +1 and Tim can do the final +2.

Just a heads up regarding exceptions: in the past we've had a lot of issues with timestamp boost routines throwing exceptions for out of range values. You should make sure you exercise any path that can do that with tests. We generally either have to reason about why the boost function can't throw an exception (maybe we check the range before hand) or we wrap the boost call with try/catch so we don't expose the exception. Also IIRC, something to keep in mind is that codegen code can't properly handle try/catch, so in cases we needed to use that, we factored the try/catch code into native code and call out to it from the IR. Again, just a heads up, not sure if your change introduced any problem in this regard or not.

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exprs/timestamp-functions-ir.cc@502
PS7, Line 502: 
             : namespace {
             : inline cctz::time_point<cctz::sys_seconds> UnixTimeToTimePoint(time_t t) {
             :   return std::chrono::time_point_cast<cctz::sys_seconds>(
             :       std::chrono::system_clock::from_time_t(0)) + cctz::sys_seconds(t);
             : }
             : 
             : }
             : 
             : StringVal TimestampFunctions::TimeOfDay(FunctionContext* context) {
             :   const TimestampVal curr = Now(context);
             :   if (curr.is_null) return StringVal::null();
             :   const string& day = ShortDayName(context, curr);
             :   const string& month = ShortMonthName(context, curr);
             :   IntVal dayofmonth = DayOfMonth(context, curr);
             :   IntVal hour = Hour(context, curr);
             :   IntVal min = Minute(context, curr);
             :   IntVal sec = Second(context, curr);
             :   IntVal year = Year(context, curr);
             : 
             :   // Calculate 'start' time point at which query execution started.
             :   cctz::time_point<cctz::sys_seconds> start = UnixTimeToTimePoint(
             :       context->impl()->state()->query_ctx().start_unix_millis / MILLIS_PER_SEC);
             :   // Find 'tz_name' time-zone abbreviation that corresponds to 'local_time_zone' at
             :   // 'start' time point.
             :   cctz::time_zone::absolute_lookup start_lookup =
             :       context->impl()->state()->local_time_zone()->lookup(start);
             :   const string& tz_name = (start_lookup.abbr != nullptr) ? start_lookup.abbr :
             :       context->impl()->state()->local_time_zone()->name();
any chance that can throw an exception? I believe our IR code can't properly handle try/catch, so if this can indeed throw an exception and needs to be wrapped in try/catch, it may need to be refactored so that this code lives in the native code and we call out to it from the IR. (Just a heads up, this may not be a problem here).


http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/timestamp-value.inline.h
File be/src/runtime/timestamp-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/timestamp-value.inline.h@59
PS7, Line 59: .days()
in the past we've had issues where the boost date library can throw exceptions. I don't remember the details off hand and it may be that you are okay here given you've already checked HasDateAndTime() and if we ensure date_ is within range, but just wanted to mention it.



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 7
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 15 May 2018 17:34:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 7:

(15 comments)

This is going to be a big improvement. Did a pass, mainly had comments about clarifying internal interfaces.

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exec/data-source-scan-node.h
File be/src/exec/data-source-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exec/data-source-scan-node.h@101
PS7, Line 101:   /// local time-zone for materializing 'TYPE_TIMESTAMP' slots.
Can local_tz be NULL? Maybe make it const& if not.


http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exec/parquet-column-readers.cc@603
PS7, Line 603:   if (dst_ts->HasDateAndTime()) dst_ts->UtcToLocal(parent_->state_->local_time_zone());
Would it make sense to cache the timezone locally in the ScalarColumnReader? That would save at least 2 pointer indirections per value, which could be meaningful in this part of the code.


http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exprs/decimal-operators.h
File be/src/exprs/decimal-operators.h:

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exprs/decimal-operators.h@172
PS7, Line 172:       const T& decimal_value, int scale, bool round, const Timezone* local_tz);
Is it nullable?


http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exprs/expr-test.cc@161
PS7, Line 161:   const Timezone *new_tz_;
nit: Timezone* new_tz_


http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exprs/timezone_db.cc@48
PS7, Line 48:     "HDFS/S3A/ADLS path to load IANA time-zone database from.");
nn]


http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/runtime-state.h@317
PS7, Line 317:   /// Query-global timezone used as local timezone when executing the query.
Can this be NULL? Would be good to document. We should maybe return a const& above if it can't be NULL so that it's self-documenting.


http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/runtime-state.cc@131
PS7, Line 131: LIKELY
LIKELY won't make a measurable difference outside of perf-critical code, I find it adds noise in cases like this. The codebase isn't very consistent about it but I'm trying to stop the pattern from spreading :)


http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/runtime-state.cc@134
PS7, Line 134:     LOG(ERROR) << "Failed to find local timezone " << query_ctx().local_time_zone
I think this should be a WARNING-level log. We should reserve ERROR for really severe errors, whereas this might flood logs.

I think we should also add a warning to the query warnings so that it's surfaced to the user, not just the admin.


http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/timestamp-value.h@98
PS7, Line 98:   static TimestampValue FromUnixTime(time_t unix_time, const Timezone* local_tz) {
Here and below it isn't clear if local_tz is allowed to be NULL. If it can be NULL, can we extend comments to explain what happens if that case. If it can't, we could make it self-documenting by making it a const& instead of a const*.


http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/timestamp-value.cc@100
PS7, Line 100: CheckIfDateOutOfRange
Maybe IsDateOutOfRange(). With "Check" it isn't clear whether returning true means that it's out of range or if it means that the timestamp passed the check.


http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/util/time.h
File be/src/util/time.h:

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/util/time.h@122
PS7, Line 122: Timezone
Maybe const& if it's not allowed to be NULL.


http://gerrit.cloudera.org:8080/#/c/9986/7/cmake_modules/FindCctz.cmake
File cmake_modules/FindCctz.cmake:

http://gerrit.cloudera.org:8080/#/c/9986/7/cmake_modules/FindCctz.cmake@27
PS7, Line 27:   $ENV{IMPALA_HOME}/thirdparty/cctz-$ENV{IMPALA_CCTZ_VERSION}/src)
We can get rid of the thirdparty/ stuff. That's just left over from when Impala stored vendored versions of these dependencies in thirdpart/


http://gerrit.cloudera.org:8080/#/c/9986/7/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/9986/7/common/thrift/ImpalaInternalService.thrift@400
PS7, Line 400:   // String containing name of the local time zone.
Maybe mention whether it's been validated or not. E.g. can it be an arbitrary string, or is it guaranteed to be a valid timezone on the coordinator (but not necessarily on the executor, since in theory the executor could have a different timezone db).


http://gerrit.cloudera.org:8080/#/c/9986/7/testdata/tzdb/abbrev.conf
File testdata/tzdb/abbrev.conf:

PS7: 
For all new files, you either need to add an Apache license header (preferable) or add it to bin/rat_exclude_files.txt (if the file format doesn't allow comments).

The precommit tests will check this, or you can run bin/check-rat-report.py (see the comment in that file).


http://gerrit.cloudera.org:8080/#/c/9986/7/tests/custom_cluster/test_custom_tzdb.py
File tests/custom_cluster/test_custom_tzdb.py:

http://gerrit.cloudera.org:8080/#/c/9986/7/tests/custom_cluster/test_custom_tzdb.py@25
PS7, Line 25: class TestCustomTimzoneDatabase(CustomClusterTestSuite):
nit:Timezone



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 7
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 16 May 2018 16:20:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 17:

> Uploaded patch set 17: Commit message was updated.

Added support for configurable timezone aliases (instead of just abbreviations).


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 17
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 Jun 2018 17:37:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 1:

(25 comments)

http://gerrit.cloudera.org:8080/#/c/9986/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9986/1//COMMIT_MSG@34
PS1, Line 34: statup
typo: startup


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc
File be/src/benchmarks/convert-timestamp-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@134
PS1, Line 134: val
I don't know what RAND_MAX is here, but I think that it can be 32K, which would mean that the time part would move very slowly from the starting time. Is this by design?

In order to have "really random" time, I would use cpp11 random classes, or put together the time from several rand() calls, each with a modulo smaller than 32k.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@166
PS1, Line 166:  d
I am a bit concerned about writing to the same buffer from every thread - maybe it does not hurt performance, but it is not how these functions are "normally" used.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@709
PS1, Line 709:     m1 = measure_multithreaded_elapsed_time(glibc_test_utc_to_unix, num_of_threads,BATCH_SIZE,
nit: long line


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@711
PS1, Line 711:     m2 = measure_multithreaded_elapsed_time(cctz_test_utc_to_unix, num_of_threads, BATCH_SIZE,
nit: long line


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@725
PS1, Line 725:     m1 = measure_multithreaded_elapsed_time(boost_test_from_utc, num_of_threads, BATCH_SIZE,
nit: long line


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@727
PS1, Line 727:     m2 = measure_multithreaded_elapsed_time(cctz_test_from_utc, num_of_threads, BATCH_SIZE,
nit: long line


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@743
PS1, Line 743:     m2 = measure_multithreaded_elapsed_time(cctz_test_utc_to_local, num_of_threads, BATCH_SIZE,
nit: long line


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions-ir.cc@523
PS1, Line 523: /
nit: missing spaces


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@63
PS1, Line 63:   if (timezone == nullptr) {
This could be UNLIKELY.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@90
PS1, Line 90: t.fractional_seconds()
You have explained in person that 't' is used instead of 'to_cs'for sub-seconds, because cctz't time type does not support nano seconds, and no timezone rule affects sub-seconds. Could you add a comment about this?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@105
PS1, Line 105:   if (timezone == nullptr) {
This could be UNLIKELY.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@142
PS1, Line 142: t.fractional_seconds()
Same as in line 90.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h
File be/src/exprs/timezone_db.h:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@28
PS1, Line 28: /// Functions to load and access the time-zone database.
Please add some comments about thread-safety (e.g. "Initialize() should be called at startup to load every timezone rule to memory. After it returned without error, other functions can be safely called from multiple threads.").


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@160
PS1, Line 160: bool IsSymbolicLink(const string& path, string* real_path) {
Maybe this could be moved to class FileSystemUtil.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@412
PS1, Line 412:   char buffer[64*1024];
I am a bit concerned about this - is it ok to keep buffers of this size on stack in Impala?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@438
PS1, Line 438: // Load custom time-zone abbreviations from 'is' and add them to 'tz_name_map_'.
In most of Impala, the comments for private functions are in the .h file and start with "///". Can you move them to the header for the sake of consistency?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@439
PS1, Line 439: void TimezoneDatabase::LoadZoneAbbreviations(istream &is,
Are you sure that a corrupt abbreviation file is not an error? Maybe duplicates can be tolerated, but other issues are errors in my opinion.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@440
PS1, Line 440: const char *path /* = nullptr */
I did not find any caller that fills this argument. Please check if it can be removed.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@460
PS1, Line 460: Skippng
typo: Skippng is used consistently instead of Skipping - is this intentional, or type + copy paste?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@484
PS1, Line 484:       if (tz_name_map_.find(abbrev) != tz_name_map_.end()) {
This could be checked before processing value and merged with the abbreviation duplicate checking in the non fixed offset branch.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/timestamp-value.cc@172
PS1, Line 172:   auto from_tp = FromUnixSeconds(unix_time);
             :   auto to_cs = cctz::convert(from_tp, TimezoneDatabase::GetUtcTimezone());
             :   // boost::gregorian::date() throws boost::gregorian::bad_year if year is not in the
             :   // 1400..9999 range. Need to check validity before creating the date object.
             :   if (UNLIKELY(CheckIfDateOutOfRange(cctz::civil_day(to_cs)))) {
             :     return ptime(not_a_date_time);
             :   } else {
             :     return ptime(
             :         boost::gregorian::date(to_cs.year(), to_cs.month(), to_cs.day()),
             :         boost::posix_time::time_duration(to_cs.hour(), to_cs.minute(), to_cs.second()));
This could be replaced by calling TimestampValue::UnixTimeToLocalPtime(unix_time, &TimezoneDatabase::GetUtcTimezone()).


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/timestamp-value.inline.h
File be/src/runtime/timestamp-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/timestamp-value.inline.h@55
PS1, Line 55: inline bool TimestampValue::UtcToUnixTime(time_t* unix_time) const {
Two opposing ideas:
a: This and ToUnixTime could call a new function that would contain the common cctz parts.
b: I think that this could be done much faster by calculating the days since 1970 and (date_ minus a constant), multiplying it with 24*60*60 and adding time_/10^9. Something similar happens in the fast path of TimestampValue::UnixTimeToUtcPtime().

boost::gregorian::date stores the days since specific date, which is practical mainly because it can be converted to seconds without dealing with "calendar stuff" like leap years, while date_.year()/month()/day() has to do some more complex calculations.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/util/time-test.cc
File be/src/util/time-test.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/util/time-test.cc@60
PS1, Line 60:   EXPECT_EQ("1677-09-21 00:12:43.146",
            :       ToUtcStringFromUnixMillis(INT64_MIN / NANOS_PER_MICRO / MICROS_PER_MILLI));
            :   EXPECT_EQ("1677-09-21 00:12:43.145225",
            :       ToUtcStringFromUnixMicros(INT64_MIN / NANOS_PER_MICRO));
Why have these times changed? Are these results "more correct" than the old ones?


http://gerrit.cloudera.org:8080/#/c/9986/1/fe/src/test/java/org/apache/impala/testutil/TestUtils.java
File fe/src/test/java/org/apache/impala/testutil/TestUtils.java:

http://gerrit.cloudera.org:8080/#/c/9986/1/fe/src/test/java/org/apache/impala/testutil/TestUtils.java@267
PS1, Line 267:     queryCtx.setLocal_time_zone("PST8PDT");
Can you explain to what was changed here? Some tests ran differently depending on the local time zone?



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 12 Apr 2018 16:47:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 4:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/9986/4/CMakeLists.txt
File CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/9986/4/CMakeLists.txt@281
PS4, Line 281: Cctz
> nit: CCTZ
Done


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/expr-test.cc@139
PS4, Line 139: new_time_zone_(time_zone), new_tz_
> From reading the names of these 2 variables it's not clear what de differen
Done


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/expr-test.cc@140
PS4, Line 140: /*overwrite*/
> Do you need this comment?
Removed it


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/expr-test.cc@153
PS4, Line 153: Timezone *
> nit: Timezone* Expect..()
Done


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/expr-test.cc@155
PS4, Line 155: new_tz_ = TimezoneDatabase::FindTimezone(new_time_zone_);
> I wonder if it makes sense to do this assignment in the constructor and the
The main reason I implemented the class this way was that some tests that use 'ScopedTimeZoneOverride' don't need the actual Timezone pointer.

On the other hand, checking always that the timezone name is valid doesn't hurt anyone. Done.


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions-ir.cc@503
PS4, Line 503: namespace
> Why did you need this namespace?
Putting  stuff into an unnamed namespace is a common C++ idiom: it says that everything in the unnamed namespace is "local" to this translation unit. They're not visible from the outside, and their names won't clash with names in other translation units.

Impala uses unnamed namespaces elsewhere too.


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions-ir.cc@504
PS4, Line 504: / TODO
> What is the plan to get rid of the "Duplicate code" TODOs in this review?
I removed the TODO comment.

I cannot think of a better place for this function. We can create a shared class for this function only but that would be awkward.


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions.cc@55
PS4, Line 55:     // This should raise some sort of error or at least return null. Hive just ignores it.
> Shouldn't this be a TODO?
Fixed the comment.


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions.cc@87
PS4, Line 87:     // This should raise some sort of error or at least return null. Hive just ignores it.
> Same as above
Fixed the comment.


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db-test.cc
File be/src/exprs/timezone_db-test.cc:

http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db-test.cc@57
PS4, Line 57: TzAbbev
> nit: TzAbbrev?
Done


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db-test.cc@68
PS4, Line 68:   // Abbreviations must start with an uppercase letter.
> If it has to start with an uppercase letter, can we add a test this with an
Done


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db-test.cc@105
PS4, Line 105:   // Misformatted time-zone names.
> Can you again play around with upper vs lower case letters here?
Done


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db.h
File be/src/exprs/timezone_db.h:

http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db.h@1
PS4, Line 1: // with the License.  You may obtain a copy of the License at
> Hmm, is the top of the Apache comment missing?
Not sure what happened there. I probably inadvertently executed some mysterious vim command :)


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db.h@91
PS4, Line 91: tz_seg
> nit: might be just my preference but tz_segment is still short and I think 
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@147
PS1, Line 147:     
> What I meant is that you could get rid of the offset_sec paramater if you c
I understand.

The problem is that then we would have to allocate an int64_t in the function and return a pointer to it wrapped in unique_ptr<int64_t> to avoid memory leaks. Seems more pain then gain.


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db.cc@365
PS4, Line 365:   hdfsFile hdfs_file = hdfsOpenFile(
> Just for my information: LoadZoneInfoFromHdfs() copies the zone info file t
LoadZoneInfoFromHdfs() uses cctz::load_time_zone() to load time-zone files into memory. cctz::load_time_zone() works only with local filesystem paths, it cannot grab files from HDFS directly.

LoadZoneAbbreviationsfromHdfs() is a lot simpler. It can read the config file directly from HDFS, creating a temp file on the local filesystem is not necessary.


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db.cc@391
PS4, Line 391:             ErrorMsg(TErrorCode::GENERAL,
> nit: I'm not exactly sure about the rules, but I feel that this line could 
Done


http://gerrit.cloudera.org:8080/#/c/9986/5/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

http://gerrit.cloudera.org:8080/#/c/9986/5/be/src/runtime/runtime-state.cc@136
PS5, Line 136:     local_time_zone_ = &TimezoneDatabase::GetUtcTimezone();
> This has already been set to GetUtcTimezone() in the constructor, right?
True, but I wanted to be more explicit here.

I was also thinking about setting 'local_time_zone_' to 'nullptr' in the constructor and then setting it properly in Init(), but I wanted to make clear to the reader that 'local_time_zone_' is always set to a valid Timezone address.

TimezoneDatabase::GetUtcTimezone() call is inexpensive, it just returns a static const reference.



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 4
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 11 May 2018 12:40:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 10:

(5 comments)

> > > > > > Uploaded patch set 9.
 > > > > >
 > > > > > Patch -set 9 contains the following changes:
 > > > > > - Added a full timezone db to testdata/tzdb.
 > > > > > - End-to-end tests and BE-tests were changed to use this
 > > > timezone
 > > > > > db. This was necessary because some timezone-tests were
 > > failing
 > > > > on
 > > > > > older jenkins workers that had an older tzdata package
 > > > installed.
 > > > >
 > > > > It might be a good idea to store the timezone-db files in one
 > > > .tar
 > > > > file and extract them before running the tests. What do you
 > > > think?
 > > >
 > > > I agree, .taring or compressing the tz db would be much better,
 > > if
 > > > it does not make the code too complicated. Having less file
 > would
 > > > make the review more readable,

From a review ability perspective, there's absolutely no need for this commit to have 221 timezone files. You can test it with ~3, or do a separate commit. i.e., it's neither here nor there.

 > >
 > > Alternatively we can store timezone files in a JAR archive
 > instead.
 > > The BE can call into the java FE to extract files from it.
 > 
 > Tim, Dan, what do you think?

The Yarn equivalent here has this notion of a "distributed cache" which is to say it stores the files locally and re-uses them across jobs. I can't tell if we should be worried that all impalads, at boot time, will slam HDFS with reading the timezonedb. I think reading ~200 files per impalad times 200 impalad daemons may be a lot of HDFS metadata load, but maybe it's comparable to what we do for queries anyway. I certainly think that tar or jar is a better way to go. Since this is happening at boot, we can probably still fork to tar, which we can assume is available, or use Java, which has tar libraries and native support for zip.

http://gerrit.cloudera.org:8080/#/c/9986/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9986/10//COMMIT_MSG@35
PS10, Line 35:   specify an HDFS/S3/ADLS location that contains the shared compiled
In what format? Does it add to the host one or override it?

My /usr/share/zoneinfo is full of symlinks which HDFS doesn't support in some configurations (and S3 certainly doesn't).


http://gerrit.cloudera.org:8080/#/c/9986/10//COMMIT_MSG@41
PS10, Line 41: - The name of the coordinator node’s local time-zone is saved to the
Is it easy to tell what the local time zone is of an impalad node? (E.g., do we log it?)


http://gerrit.cloudera.org:8080/#/c/9986/10//COMMIT_MSG@45
PS10, Line 45: - Introduces a new startup flag (--hdfs_zone_abbrev_conf) to impalad
What's the distinction between this and --hdfs_zone_info_dir? Do we need both?


http://gerrit.cloudera.org:8080/#/c/9986/10/testdata/tzdb/2017c/Africa/Abidjan
File testdata/tzdb/2017c/Africa/Abidjan:

http://gerrit.cloudera.org:8080/#/c/9986/10/testdata/tzdb/2017c/Africa/Abidjan@1
PS10, Line 1: ../Atlantic/St_Helena
We're adding a ton of files. Do we need such a big database for our testing purposes?


http://gerrit.cloudera.org:8080/#/c/9986/10/tests/custom_cluster/test_shared_tzdb.py
File tests/custom_cluster/test_shared_tzdb.py:

http://gerrit.cloudera.org:8080/#/c/9986/10/tests/custom_cluster/test_shared_tzdb.py@38
PS10, Line 38:     cls.ImpalaTestMatrix.add_constraint(lambda v:
Add a comment about what this is trying to do?



-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 30 May 2018 15:59:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, Zoltan Borok-Nagy, Csaba Ringhofer, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9986

to look at the new patch set (#4).

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................

IMPALA-3307: Add support for IANA time-zone db

Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.
- Introduces a new startup flag (--hdfs_zone_info_dir) to impalad to
  specify an HDFS/S3/ADLS location that contains the shared compiled
  IANA time-zone database. If the startup flag is set, impalad will
  use the specified time-zone database. Otherwise, impalad will use
  the default /usr/share/zoneinfo time-zone database.
- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.
- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.
- Introduces a new startup flag (--hdfs_zone_abbrev_conf) to impalad
  to specify an HDFS/S3/ADLS path to a shared config file that
  contains definitions for non-standard time-zone abbreviations.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
A be/src/exprs/timezone_db-test.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/hdfs-util-test.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M be/src/util/time-test.cc
M be/src/util/time.cc
M be/src/util/time.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindCctz.cmake
M common/thrift/ImpalaInternalService.thrift
M common/thrift/metrics.json
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/bin/create-load-data.sh
M testdata/data/timezoneverification.csv
A testdata/tzdb/abbrev.conf
A testdata/tzdb/zoneinfo/AmerICA/ArgeNTINA/MendOZA
A testdata/tzdb/zoneinfo/AmerICA/CancUN
A testdata/tzdb/zoneinfo/UTC
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
A tests/custom_cluster/custom_tzdb.py
53 files changed, 2,531 insertions(+), 1,096 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/4
-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 4
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................

IMPALA-3307: Add support for IANA time-zone db

Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.

- Introduces a new startup flag (--hdfs_zone_info_zip) to impalad to
  specify an HDFS/S3/ADLS path to a zip archive that contains the
  shared compiled IANA time-zone database. If the startup flag is set,
  impalad will use the specified time-zone database. Otherwise,
  impalad will use the default /usr/share/zoneinfo time-zone database.

- Introduces a new startup flag (--hdfs_zone_alias_conf) to impalad to
  specify an HDFS/S3/ADLS path to a shared config file that contains
  definitions for non-standard time-zone aliases.

- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.

- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.

- Adds a new ZipUtil class to extract files from a zip archive. The
  implementation is not vulnerable to Zip Slip.

Cherry-picks: not for 2.x.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Reviewed-on: http://gerrit.cloudera.org:8080/9986
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Attila Jeges <at...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/common/init.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
A be/src/exprs/timezone_db-test.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/frontend.cc
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/util/CMakeLists.txt
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/hdfs-util-test.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M be/src/util/time-test.cc
M be/src/util/time.cc
M be/src/util/time.h
A be/src/util/zip-util-test.cc
A be/src/util/zip-util.cc
A be/src/util/zip-util.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/rat_exclude_files.txt
A cmake_modules/FindCctz.cmake
M common/thrift/CMakeLists.txt
M common/thrift/ImpalaInternalService.thrift
A common/thrift/Zip.thrift
M common/thrift/metrics.json
A fe/src/main/java/org/apache/impala/util/ZipUtil.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/bin/create-load-data.sh
M testdata/data/timezoneverification.csv
A testdata/tzdb/2017c-corrupt.zip
A testdata/tzdb/2017c.zip
A testdata/tzdb/alias.conf
A testdata/tzdb_tiny/America/New_York
A testdata/tzdb_tiny/Etc/GMT+4
A testdata/tzdb_tiny/US/Eastern
A testdata/tzdb_tiny/UTC
A testdata/tzdb_tiny/Zulu
A testdata/tzdb_tiny/posix/UTC
A testdata/tzdb_tiny/posixrules
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
A tests/custom_cluster/test_shared_tzdb.py
D tests/query_test/test_timezones.py
72 files changed, 3,086 insertions(+), 1,176 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified
  Attila Jeges: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 23
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, Zoltan Borok-Nagy, Csaba Ringhofer, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9986

to look at the new patch set (#2).

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................

IMPALA-3307: Add support for IANA time-zone db

Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.
- Introduces a new startup flag (--hdfs_zoneinfo_dir) to impalad to
  specify an HDFS/S3/ADLS location that contains the shared compiled
  IANA time-zone database. If the startup flag is set, impalad will
  use the specified time-zone database. Otherwise, impalad will use
  the default /usr/share/zoneinfo time-zone database.
- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.
- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.
- Introduces a new startup flag (--hdfs_zoneabbrev_config) to impalad
  to specify an HDFS/S3/ADLS path to a shared config file that
  contains definitions for non-standard time-zone abbreviations.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/hdfs-util-test.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M be/src/util/time-test.cc
M be/src/util/time.cc
M be/src/util/time.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindCctz.cmake
M common/thrift/ImpalaInternalService.thrift
M common/thrift/metrics.json
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/data/timezoneverification.csv
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
43 files changed, 1,972 insertions(+), 1,051 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/2
-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>