You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Gabor Kaszab (Code Review)" <ge...@cloudera.org> on 2019/08/01 14:01:53 UTC

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Hello Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13722

to look at the new patch set (#11).

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................

IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat. Contrary to how these
functions work the FORMAT clause must specify a string literal and
cannot be used with any other kind of a string expression.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month
- DD: Day
- DDD: Day of year
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute
- SS: Second
- SSSSS: Second of day
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour
- TZM: Timezone minute
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-expr.cc
A be/src/exprs/cast-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/date-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/date-parse-util.cc
M be/src/runtime/date-parse-util.h
M be/src/runtime/date-test.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
A be/src/runtime/datetime-iso-sql-format-parser.cc
A be/src/runtime/datetime-iso-sql-format-parser.h
A be/src/runtime/datetime-iso-sql-format-tokenizer.cc
A be/src/runtime/datetime-iso-sql-format-tokenizer.h
D be/src/runtime/datetime-parse-util.h
A be/src/runtime/datetime-parser-common.cc
A be/src/runtime/datetime-parser-common.h
R be/src/runtime/datetime-simple-date-format-parser.cc
A be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/testutil/random-vector-generators.h
M be/src/util/dict-test.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/string-parser.h
M common/thrift/Exprs.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
M testdata/workloads/functional-query/queries/QueryTest/date.test
A tests/query_test/test_cast_with_format.py
54 files changed, 3,434 insertions(+), 861 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13722/11
-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 11
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>