You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Gabor Kaszab (Code Review)" <ge...@cloudera.org> on 2019/06/25 14:43:40 UTC

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13722


Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................

IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month
- DD: Day
- DDD: Day of year
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute
- SS: Second
- SSSSS: Second of day
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour
- TZM: Timezone minute
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-expr.cc
A be/src/exprs/cast-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/date-parse-util.cc
M be/src/runtime/date-parse-util.h
M be/src/runtime/date-test.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
A be/src/runtime/datetime-iso-sql-format-parser.cc
A be/src/runtime/datetime-iso-sql-format-parser.h
A be/src/runtime/datetime-iso-sql-format-tokenizer.cc
A be/src/runtime/datetime-iso-sql-format-tokenizer.h
D be/src/runtime/datetime-parse-util.h
A be/src/runtime/datetime-parser-common.cc
A be/src/runtime/datetime-parser-common.h
R be/src/runtime/datetime-simple-date-format-parser.cc
A be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/testutil/random-vector-generators.h
M be/src/util/dict-test.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/string-parser.h
M common/thrift/Exprs.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format.test
M testdata/workloads/functional-query/queries/QueryTest/date.test
A tests/query_test/test_cast_with_format.py
53 files changed, 3,097 insertions(+), 697 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13722/1
-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 21:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4520/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 21
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 10 Sep 2019 12:33:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@78
PS3, Line 78:         if (!ParseAndValidate(current_pos, group_len, 0, 9999, &result->year)) return false;
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@95
PS3, Line 95:         if (!ParseAndValidate(current_pos, group_len, 1, 12, &result->month)) return false;
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@124
PS3, Line 124:         if (!ParseAndValidate(current_pos, group_len, 0, 59, &result->minute)) return false;
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@128
PS3, Line 128:         if (!ParseAndValidate(current_pos, group_len, 0, 59, &result->second)) return false;
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@133
PS3, Line 133:         if (!ParseAndValidate(current_pos, group_len, 0, 86399, &second_in_day)) return false;
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-simple-date-format-parser.h
File be/src/runtime/datetime-simple-date-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-simple-date-format-parser.h@98
PS3, Line 98:   /// accept_time_toks_only -- if true, time tokens w/o date tokens are accepted. Otherwise,
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-simple-date-format-parser.h@104
PS3, Line 104:   /// Parse date/time string to find the corresponding default date/time format context. The
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-simple-date-format-parser.h@160
PS3, Line 160:   /// Does only a basic validation on the parsed date/time values. The caller is responsible
line too long (92 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Jul 2019 09:55:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13722

to look at the new patch set (#8).

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................

IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month
- DD: Day
- DDD: Day of year
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute
- SS: Second
- SSSSS: Second of day
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour
- TZM: Timezone minute
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-expr.cc
A be/src/exprs/cast-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/date-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/date-parse-util.cc
M be/src/runtime/date-parse-util.h
M be/src/runtime/date-test.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
A be/src/runtime/datetime-iso-sql-format-parser.cc
A be/src/runtime/datetime-iso-sql-format-parser.h
A be/src/runtime/datetime-iso-sql-format-tokenizer.cc
A be/src/runtime/datetime-iso-sql-format-tokenizer.h
D be/src/runtime/datetime-parse-util.h
A be/src/runtime/datetime-parser-common.cc
A be/src/runtime/datetime-parser-common.h
R be/src/runtime/datetime-simple-date-format-parser.cc
A be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/testutil/random-vector-generators.h
M be/src/util/dict-test.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/string-parser.h
M common/thrift/Exprs.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
M testdata/workloads/functional-query/queries/QueryTest/date.test
A tests/query_test/test_cast_with_format.py
54 files changed, 3,395 insertions(+), 858 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13722/8
-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 23:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4596/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 23
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 19 Sep 2019 09:33:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13722

to look at the new patch set (#23).

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................

IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat. Contrary to how these
functions work the FORMAT clause must specify a string literal and
cannot be used with any other kind of a string expression.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month (1-12)
- DD: Day (1-31)
- DDD: Day of year (1-366)
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute (0-59)
- SS: Second (0-59)
- SSSSS: Second of day (0-86399)
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour (-99-+99)
- TZM: Timezone minute (0-99)
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-format-expr.cc
A be/src/exprs/cast-format-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/date-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/date-parse-util.cc
M be/src/runtime/date-parse-util.h
M be/src/runtime/date-test.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
A be/src/runtime/datetime-iso-sql-format-parser.cc
A be/src/runtime/datetime-iso-sql-format-parser.h
A be/src/runtime/datetime-iso-sql-format-tokenizer.cc
A be/src/runtime/datetime-iso-sql-format-tokenizer.h
D be/src/runtime/datetime-parse-util.h
A be/src/runtime/datetime-parser-common.cc
A be/src/runtime/datetime-parser-common.h
R be/src/runtime/datetime-simple-date-format-parser.cc
A be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/testutil/random-vector-generators.h
M be/src/util/dict-test.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/string-parser.h
M common/thrift/Exprs.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
M testdata/workloads/functional-query/queries/QueryTest/date.test
A tests/query_test/test_cast_with_format.py
54 files changed, 3,521 insertions(+), 972 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13722/23
-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 23
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 2:

(49 comments)

I've reviewed the first portion of the change, will continue looking at it on Monday.

http://gerrit.cloudera.org:8080/#/c/13722/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13722/2//COMMIT_MSG@49
PS2, Line 49: In a string type to timestamp conversion the timezone offset tokens
            :   are parsed, expected to match with the input but they don't adjust
            :   the result as the input is already expected to be in UTC format.
Is this behavior consistent with how other SQL systems work?


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/benchmarks/parse-timestamp-benchmark.cc
File be/src/benchmarks/parse-timestamp-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/benchmarks/parse-timestamp-benchmark.cc@47
PS2, Line 47: // Benchmark for parsing timestamps.
            : // Machine Info: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
            : // ParseDate:            Function     Rate (iters/ms)          Comparison
            : // ----------------------------------------------------------------------
            : //                BoostStringDate               1.277                  1X
            : //                      BoostDate               1.229              0.962X
            : //                         Impala               16.83              13.17X
            : //
            : // ParseTimestamp:       Function     Rate (iters/ms)          Comparison
            : // ----------------------------------------------------------------------
            : //                      BoostTime              0.9074                  1X
            : //                         Impala               15.01              16.54X
            : //
            : // ParseTimestampWithFormat:Function  Rate (iters/ms)          Comparison
            : // ----------------------------------------------------------------------
            : //                  BoostDateTime              0.4488                  1X
            : //                ImpalaTimeStamp               37.41              83.35X
            : //              ImpalaTZTimeStamp               37.39               83.3X
Maybe it wold make sense to add the new parsing functions to these benchmarks. What do you think?


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/common/init.cc
File be/src/common/init.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/common/init.cc@312
PS2, Line 312: InitParseCtx
I think this should be renamed to InitSimpleDateParseCtx() to make it clear that this is for the simple date format parsing.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-expr.h
File be/src/exprs/cast-expr.h:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-expr.h@29
PS2, Line 29: CastExpr
If I understand this correctly, this class is used only for the new cast operator with format. Maybe it should be called CastFormatExpr or something similar.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc
File be/src/exprs/cast-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc@172
PS2, Line 172: DateTimeFormatContext*
const DateTimeFormatContext*


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc@180
PS2, Line 180: tv.Format(*format_ctx, buf_len, buf);
Check the return value, like you do in L199-200.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc@191
PS2, Line 191: DateTimeFormatContext
This should be const too.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc@308
PS2, Line 308: dt_ctx
Rename to 'format_ctx' for consistency.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc@308
PS2, Line 308: DateTimeFormatContext
This should be const too.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc@312
PS2, Line 312: char*
const char*


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc@315
PS2, Line 315: char*
const char*


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc@342
PS2, Line 342: dt_ctx
Rename to 'format_ctx'


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc@342
PS2, Line 342: DateTimeFormatContext*
Should be const.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc@346
PS2, Line 346: char*
const char*


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc@348
PS2, Line 348: char*
const char*


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/expr-test.cc@76
PS2, Line 76: 
Why the extra new line?


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/expr-test.cc@3214
PS2, Line 3214: 2002
Any reason this was changed to 2002?


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/expr-test.cc@3322
PS2, Line 3322: 10
Any reason for these timestamp changes in L3322-3337?


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/date-parse-util.h
File be/src/runtime/date-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/date-parse-util.h@36
PS2, Line 36: Parse
Probably this should be renamed to ParseSimpleDateFormat(), right?


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/date-parse-util.h@47
PS2, Line 47: Parse
Same here.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/date-parse-util.h@84
PS2, Line 84: output parameter
"set output parameter to an invalid DateValue"


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/date-parse-util.h@86
PS2, Line 86: tatic bool IndicateDateParseFailure(DateValue* date);
I think this would be better placed in the .cc file only. 

Maybe using a macro instead would be more straightforward?


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/date-parse-util.cc
File be/src/runtime/date-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/date-parse-util.cc@40
PS2, Line 40: Parse
rename to ParseSimpleDateFormat()?


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/date-parse-util.cc@90
PS2, Line 90: Parse
Rename to ParseSimpleDateFormat()?


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/date-parse-util.cc@190
PS2, Line 190: bool DateParser::IndicateDateParseFailure(DateValue* date) {
             :   *date = DateValue();
             :   return false;
             : }
A macro instead would be easier to use

Something like: 
#define DATE_PARSE_FAILURE(date) {\
  *(date) = DateValue();\
  return false;\
}\
while (false)

And then, you can do:
if (failure) DATE_PARSE_FAILURE(date);


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/date-value.h
File be/src/runtime/date-value.h:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/date-value.h@126
PS2, Line 126: Parse
Rename this and Parse() functions above to ParseSimpleDateFormat()


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.h
File be/src/runtime/datetime-iso-sql-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.h@24
PS2, Line 24: #include <string>
Probably not unnecessary.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@54
PS2, Line 54: DateTimeIsoSqlFormatTokenizer::IsSeparator(current_pos) &&
            :           current_pos - input_str < input_len
current_pos should be validated first before calling IsSeparator() on it.

Also, it would simplify things if we introduced a helper variable at beginning of the function:
const char* end_pos = input_str + input_len;

We can check (current_pos < end_pos) everywhere instead of (current_pos - input_str < input_len).


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@44
PS2, Line 44:     if (tok->type == SEPARATOR) {
            :       if (!DateTimeIsoSqlFormatTokenizer::IsSeparator(current_pos)) return false;
            :       // Advance to the end of the separator sequence in the expected tokens list.
            :       while (tok->type == SEPARATOR && i < dt_ctx.toks.size()) {
            :         ++i;
            :         if (i < dt_ctx.toks.size()) tok = &dt_ctx.toks[i];
            :       }
            :       bool separator_found = false;
            :       // Find a separator token in the input and advance to the end of the separator
            :       // sequence.
            :       while (DateTimeIsoSqlFormatTokenizer::IsSeparator(current_pos) &&
            :           current_pos - input_str < input_len) {
            :         // The last '-' of a separator sequence might be taken as a sign for timezone
            :         // hour.
            :         if (separator_found && tok->type == TIMEZONE_HOUR && *current_pos == '-' &&
            :             current_pos + 1 - input_str < input_len &&
            :             !DateTimeIsoSqlFormatTokenizer::IsSeparator(current_pos + 1)) {
            :           break;
            :         }
            :         separator_found = true;
            :         ++current_pos;
            :       }
            :       if (!separator_found) return false;
            :     }
            :     // If either the input or the format got to its end.
            :     if (i == dt_ctx.toks.size() || current_pos - input_str == input_len) {
            :       if (i == dt_ctx.toks.size() && current_pos - input_str == input_len) return true;
            :       // If either the input or the format has unprocessed tokens then indicate failure.
            :       return false;
            :     }
I think this could be refactored a bit:

if (tok->type == SEPARATOR) {
  if (!DateTimeIsoSqlFormatTokenizer::IsSeparator(current_pos)) return false;

  // Advance to the end of the separator sequence.
  ++current_pos;
  while (current_pos - input_str < input_len
      && DateTimeIsoSqlFormatTokenizer::IsSeparator(current_pos)) {
    ++current_pos;
  }

  // Advance to the end of the separator sequence in the expected tokens list.
  ++i;
  while (i < dt_ctx.toks.size() && dt_ctx.toks[i] == SEPARATOR) ++i;

  // If we reached the end of input or the end of token sequence, we can return.
  if (current_pos - input_str >= input_len || i >= dt_ctx.toks.size()) {
    return (current_pos - input_str >= input_len && i >= dt_ctx.toks.size());
  }

  // Next token, following the separator sequence.
  tok = &dt_ctx.toks[i];

  // The last '-' of a separator sequence might be taken as a sign for timezone hour.
  if (*(current_pos - 1) == '-' && tok->type == TIMEZONE_HOUR) {
    --current_pos;
  }
}


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@76
PS2, Line 76: GetNextTokenGroupFromInput(current_pos, (int)(input_len - (current_pos - input_str)),
            :         *tok, &group_end_pos);
            :     if (current_pos == group_end_pos) return false;
I think it would be better if GetNextTokenGroupFromInput() returned a bool that signals success/failure.

Alternatively it could return 'group_end_pos' and then we wouldn't need the last out parameter.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@87
PS2, Line 87: if (UNLIKELY(result->year < 1400)) return false;
I'm not sure if this is correct.

This function is used for parsing dates too not just for timestamps.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@96
PS2, Line 96: if (UNLIKELY(result->year < 1400)) return false;
Same as L87


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@111
PS2, Line 111: day_count < 0
day_count < 1 ?


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@141
PS2, Line 141:         StringParser::ParseResult status;
             :         int second_in_day =
             :             StringParser::StringToInt<int>(current_pos, group_len, &status);
             :         if (UNLIKELY(StringParser::PARSE_SUCCESS != status)) return false;
             :         if (UNLIKELY(second_in_day < 0 || second_in_day > 86399)) return false;
             :         result->second = second_in_day % 60;
             :         int minutes_in_day = second_in_day / 60;
             :         result->minute = minutes_in_day % 60;
             :         result->hour = minutes_in_day / 60;
Block can be moved to a separate functon, here and elsewhere in the switch statement.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@184
PS2, Line 184: result->year >= 1400
(result->year >= 0) ?
This function is used for parsing dates too, not just timestamps.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@185
PS2, Line 185: month
Introducing a new variable 'month' is not really necessary.
(i + 1) can be used in the loop instead of 'month'.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@184
PS2, Line 184:     DCHECK(result->year >= 1400 && result->year <= 9999);
             :     int month = 1;
             :     for (int i = 0; i < 12; ++i) {
             :       int month_length = MONTH_LENGTHS[i];
             :       if (i == 1 && boost::gregorian::gregorian_calendar::is_leap_year(result->year)) {
             :         month_length = 29;
             :       }
             :       if (month_length >= day_in_year) {
             :         result->month = month;
             :         result->day = day_in_year;
             :         break;
             :       }
             :       ++month;
             :       day_in_year -= month_length;
             :     }
             :     DCHECK(result->month <= 12);
             :     DCHECK(result->day <= 31);
Move this to a separate function.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@206
PS2, Line 206: void DateTimeIsoSqlFormatParser::GetNextTokenGroupFromInput(const char* input_str,
             :     int input_len, const DateTimeFormatToken& tok, const char** end_pos)
Return bool to signal success/failure. Or return end_pos instead of passing in back to the caller as an out parameter.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@208
PS2, Line 208: DCHECK(*end_pos == input_str);
We could just set *end_pos in the GetNextTokenGroupFromInput() function, instead of expecting the caller to set it.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@219
PS2, Line 219: if (tok.type == TIMEZONE_HOUR && *input_str != '-' && *input_str != '+') input_len = 2;
I think, this should be:

if (input_len > 2 && ..) input_len = 2;


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@225
PS2, Line 225: if (DateTimeIsoSqlFormatTokenizer::IsSeparator(*end_pos)) return;
Isn't it a parsing error if (len < tok.len) but *end_pos is a separator?

It would mean that the length of the token in the pattern is longer than the length of the token in the input.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@229
PS2, Line 229: void DateTimeIsoSqlFormatParser::ParseMeridiemIndicatorFromInput(const char* input_str,
             :     int input_len, const char** end_pos)
Same as L206. Consider returning a bool or end_pos.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@231
PS2, Line 231: DCHECK(*end_pos == input_str);
ParseMeridiemIndicatorFromInput() function should set end_pos instead of expecting the caller to set it.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@241
PS2, Line 241:   if (input_len < 4) return;
             :   token_str = string(input_str, 4);
             :   boost::to_upper(token_str);
             :   if (DateTimeIsoSqlFormatTokenizer::GetTokenType(token_str, &token_type) &&
             :       token_type == MERIDIEM_INDICATOR) {
             :     *end_pos += 4;
             :     return;
             :   }
Duplicate code.

You could put this into a loop instead:
for (int expected_tok_len: {2, 4}) {
  ..
}


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@253
PS2, Line 253: actual_token_len < 4
actual_token_len > 0 && actual_token-len < 4


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@261
PS2, Line 261: GetRoundYear
Isn;t this called realigned year?


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.cc
File be/src/runtime/datetime-parser-common.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.cc@118
PS2, Line 118: Construct
Maybe "Parse" instead of "Construct" would be better in function names here and below.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-simple-date-format-parser.h
File be/src/runtime/datetime-simple-date-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-simple-date-format-parser.h@90
PS2, Line 90: InitParseCtx
rename to InitSimpleDateParseCtx().


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-simple-date-format-parser.h@93
PS2, Line 93: IsParseCtxInitialized
Rename to IsSimpleDateParseCtxInitialized()



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Jun 2019 15:59:38 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13722

to look at the new patch set (#3).

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................

IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month
- DD: Day
- DDD: Day of year
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute
- SS: Second
- SSSSS: Second of day
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour
- TZM: Timezone minute
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-expr.cc
A be/src/exprs/cast-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/date-parse-util.cc
M be/src/runtime/date-parse-util.h
M be/src/runtime/date-test.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
A be/src/runtime/datetime-iso-sql-format-parser.cc
A be/src/runtime/datetime-iso-sql-format-parser.h
A be/src/runtime/datetime-iso-sql-format-tokenizer.cc
A be/src/runtime/datetime-iso-sql-format-tokenizer.h
D be/src/runtime/datetime-parse-util.h
A be/src/runtime/datetime-parser-common.cc
A be/src/runtime/datetime-parser-common.h
R be/src/runtime/datetime-simple-date-format-parser.cc
A be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/testutil/random-vector-generators.h
M be/src/util/dict-test.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/string-parser.h
M common/thrift/Exprs.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
M testdata/workloads/functional-query/queries/QueryTest/date.test
A tests/query_test/test_cast_with_format.py
53 files changed, 3,306 insertions(+), 817 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13722/3
-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 20:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/exprs/cast-functions-ir.cc
File be/src/exprs/cast-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/exprs/cast-functions-ir.cc@181
PS17, Line 181:     string buf;
> Is there a reasonable compile time upper limit to buf_len? If yes, please a
Something doesn't add up for me here. Let's say I give a format that contains a single "FF9". Then the 'buf_len' is going to be 4, same as the size of 'buf'. The printout result however contains 8 digits. I think here we write also after the buffer in tv.Format() and the only reason this works as we put a string terminator char at the end.

Update: changed Format() to accept string param intead of char array and length



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 20
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Sep 2019 14:48:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 5:

Did some benchmarking on the IsoSql parsing and there was one thing that caused a decent performance drop compared to the SimpleDate format version:
- For SimpleDateFormat everything is a separator that is not a digit, so checking the end of a pattern section is done via calling isdigit(). On the other hand my implementation had an unordered_set to contain the separator characters and for each character in the input I made a lookup in this set. Apparently isdigit() outperforms the set implementation so I made some massaging on the IsSeparator() function.
- The improvement was to get rid of the unordered_set for separators and simply do comparisons on hard-coded characters within IsSeparator(). This gave a significant performance improvement. (Still doesn;t reach the efficiency of isdigit())

Still the SimpleDateFormat implementation has some performance advantage over the IsoSql implementation but this is due to the fact that the latter offers more flexibility:
 - The length of the separator sequences is flexible (matching is not strict char by char).
 - There is a defined set of characters that can serve as a separator (not taking everything non-digit as separator).
I feel that taking these extra functionalities into account the performance difference is reasonable and acceptable.


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Jul 2019 13:56:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 16:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4460/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 16
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Sep 2019 09:43:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13722

to look at the new patch set (#7).

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................

IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month
- DD: Day
- DDD: Day of year
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute
- SS: Second
- SSSSS: Second of day
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour
- TZM: Timezone minute
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-expr.cc
A be/src/exprs/cast-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/date-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/date-parse-util.cc
M be/src/runtime/date-parse-util.h
M be/src/runtime/date-test.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
A be/src/runtime/datetime-iso-sql-format-parser.cc
A be/src/runtime/datetime-iso-sql-format-parser.h
A be/src/runtime/datetime-iso-sql-format-tokenizer.cc
A be/src/runtime/datetime-iso-sql-format-tokenizer.h
D be/src/runtime/datetime-parse-util.h
A be/src/runtime/datetime-parser-common.cc
A be/src/runtime/datetime-parser-common.h
R be/src/runtime/datetime-simple-date-format-parser.cc
A be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/testutil/random-vector-generators.h
M be/src/util/dict-test.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/string-parser.h
M common/thrift/Exprs.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
M testdata/workloads/functional-query/queries/QueryTest/date.test
A tests/query_test/test_cast_with_format.py
54 files changed, 3,394 insertions(+), 858 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13722/7
-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 18:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4468/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 18
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Sep 2019 16:17:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 10:

(23 comments)

Some more nitpicky comments, I'll continue tomorrow.

http://gerrit.cloudera.org:8080/#/c/13722/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13722/10//COMMIT_MSG@17
PS10, Line 17: is a string literal provided by the
             : user and its value can't come from a column.
nit: must specify a string literal and cannot be used with any other kind of a string expression.


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/exprs/cast-expr.h
File be/src/exprs/cast-expr.h:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/exprs/cast-expr.h@26
PS10, Line 26: //
nit: /// here and below.


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/exprs/cast-expr.cc
File be/src/exprs/cast-expr.cc:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/exprs/cast-expr.cc@30
PS10, Line 30:     RuntimeState* state, ScalarExprEvaluator* eval) const {
nit: DCHECK(eval != nullptr);

here and in CloseEvaluator();


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/exprs/cast-functions-ir.cc
File be/src/exprs/cast-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/exprs/cast-functions-ir.cc@182
PS10, Line 182:    char buf[buf_len];
              :     int ret_val = tv.Format(*format_ctx, buf_len, buf);
Maybe instead of allocating 'buf' on the stack, we should allocate it on the heap (unless it is guaranteed that 'buf_len' is a fairly small number).

vector<char> buf(buf_len);
int ret_val = tv.Format(*format_ctx, buf.size(), buf.data());


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/exprs/cast-functions-ir.cc@204
PS10, Line 204:     char buf[buf_len];
              :     int ret_val = dv.Format(*format_ctx, buf_len, buf);
Same as L182 above.


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/date-parse-util.cc
File be/src/runtime/date-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/date-parse-util.cc@127
PS10, Line 127: dt_ctx.has_date_toks
In DateParser::ParseSimpleDateFormat() dt_ctx.has_date_toks is DCHECKed at the beginning of the function.

Any reason we don't make that assumption here?


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/date-parse-util.cc@152
PS10, Line 152: !=
<


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-parser.h
File be/src/runtime/datetime-iso-sql-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-parser.h@71
PS10, Line 71: '*tok'
**tok ?


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-parser.cc@172
PS10, Line 172: dt_ctx_it
Maybe 'current_tok_ind' ?


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-parser.cc@195
PS10, Line 195:   // Next token, following the separator sequence.
              :   *tok = &dt_ctx.toks[*dt_ctx_it];
Thanks for refactoring the algorithm and moving all the separator skipping to a separate function.

Maybe L195-196 could be moved to after L49 and then 'tok' wouldn't have to be passed to the function. It feels redundant to pass both 'tok' and 'dt_ctx_it' to ProcessSeparators.


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h
File be/src/runtime/datetime-iso-sql-format-tokenizer.h:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h@91
PS10, Line 91: we 
nit: we have


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@110
PS10, Line 110: unsigned
long?

I could be wrong but I think "unsigned" is an alias for "unsigned int", so there is a "long->unsigned int" implicit cast happening behind the assignment. We should avoid implicit casts and define 'curr_token_size' as long.


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@110
PS10, Line 110: long)MAX_TOKEN_SIZE
Either use static_cast<long> or define MAX_TOKEN_SIZE as a long.


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-parser-common.h
File be/src/runtime/datetime-parser-common.h:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-parser-common.h@148
PS10, Line 148: token group
token


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-parser-common.h@150
PS10, Line 150: token groups
token


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-parser-common.h@209
PS10, Line 209: token_group
token


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-parser-common.h@212
PS10, Line 212: token_group
token


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-parser-common.cc
File be/src/runtime/datetime-parser-common.cc:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-parser-common.cc@123
PS10, Line 123: token_group
token


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-parser-common.cc@135
PS10, Line 135: token_group
token


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-simple-date-format-parser.h
File be/src/runtime/datetime-simple-date-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-simple-date-format-parser.h@126
PS10, Line 126: token
              :   /// groups.
tokens?


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-simple-date-format-parser.cc
File be/src/runtime/datetime-simple-date-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-simple-date-format-parser.cc@193
PS10, Line 193: token group 
token


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-simple-date-format-parser.cc@207
PS10, Line 207: token groups
tokens


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.cc@263
PS9, Line 263: break;
> I made some const string in the parser common and use them here. What do yo
Looks good to me.



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Jul 2019 18:37:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 19:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4469/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 19
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Sep 2019 18:12:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3866/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Jul 2019 14:53:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3889/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Jul 2019 08:30:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 10:

(27 comments)

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-expr.h
File be/src/exprs/cast-expr.h:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-expr.h@28
PS9, Line 28: class CastFormatExpr : public ScalarFnCall {
> nit: Move this comment in front of L31.
Done


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-functions-ir.cc
File be/src/exprs/cast-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-functions-ir.cc@171
PS9, Line 171:   DCHECK(ctx != nullptr);
> Add DCHECK(ctx != nullptr);
Done


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-functions-ir.cc@178
PS9, Line 178: 
> It would be better to use ToString() directly rather than involving the ind
Done


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-functions-ir.cc@191
PS9, Line 191: StringVal CastFunctions::CastToStringVal(FunctionContext* ctx, const DateVal& val) {
> Add DCHECK(ctx != nullptr);
Done


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-functions-ir.cc@347
PS9, Line 347: 
> Add DCHECK(ctx != nullptr);
Done


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.h
File be/src/runtime/datetime-iso-sql-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.h@28
PS9, Line 28: //
> nit: in most other header files comments are prefixed with ///
Done


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.h@46
PS9, Line 46:  posi
> position
Done


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.h@47
PS9, Line 47: ext token t
> It hasn't been properly defined what "token group" means. Whst's the differ
Nothing, I use them as synonyms :)


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.h@48
PS9, Line 48: GetNextTokenFromInput(cons
> Naming is a bit confusing as it is used to find the end of the current toke
Done


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@116
PS3, Line 116: break;
> Maybe add some DCHECKs at the beginning of the function then?
Good idea. Added a DCHECK for result->hour.


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.cc@47
PS9, Line 47:    if (tok->type == SEPARATOR) {
            :       bool res = ProcessSeparators(&current_pos, end_pos, dt_ctx, &i, &tok);
            :       if (!res || current_pos == end_pos) return res;
            :     }
            : 
            :     const char* token_end_pos = GetNextTokenFromInput(current_pos,
            :         end_pos - current_pos, *tok);
            :     if (token_end_pos == nullptr) return false;
            :     int token_len = token_end_pos - current_pos;
            : 
            :     switch(tok->type) {
            :       case YEAR: {
            :         if (!ParseAndValidate(current_pos, token_len, 0, 9999, &result->year)) {
            :           return false;
            :         }
            :         if (token_len < 4) {
            :             PrefixYearFromCurrentYear(token_len, dt_ctx.current_time, result);
            :         }
            :         break;
            :       }
            :       case ROUND_YEAR: {
            :        
> Please consider moving this block to a separate function.
Done


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.h
File be/src/runtime/datetime-iso-sql-format-tokenizer.h:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.h@73
PS9, Line 73:  deci
> nit: decide
Done


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.h@75
PS9, Line 75:  token matc
> Again, the wording here and below is a bit confusing. Shouldn't it just be 
I'll use the naming 'token' instead.


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.h@91
PS9, Line 91:  found i
> nit: found
Done


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@110
PS9, Line 110: unsigned
> I think it would be safer to just use 'int' here.
See below.


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@110
PS9, Line 110: (long)MAX_TOKEN_SIZE
> Is explicit cast to long necessary here?
Without the cast I get a compilation error.

If I convert it to int then I get this:
error: no matching function for call to ‘min(int, long int)’


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-parser-common.h
File be/src/runtime/datetime-parser-common.h:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-parser-common.h@204
PS9, Line 204: /// and pushes it to 'context'. Depending on 'is_error' the message can be an error or
             : /// warning.
             : void ReportBadFormat(FunctionContext* context, FormatTokenizationResult error_type,
             :     const StringVal& format, bool is_error);
             : 
> Add WARN_UNUSED_RESULT to these declarations.
Done


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-parser-common.cc
File be/src/runtime/datetime-parser-common.cc:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-parser-common.cc@64
PS9, Line 64: if (format.is_null || format.len == 0) {
> nit: move this after L112.
Done


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.h@48
PS9, Line 48: static bool ParseSimpleDateFormat(const char* str, int len, boost::gregorian::date* d,
            :       boost::posix_time::time_duration* t) 
> Add WARN_UNUSED_RESULT here and L60, L69.
Didn't see the poin of adding that here and use discard_result() on the callsites as it is done with the date versions of these functions. Done.


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.h@65
PS9, Line 65: used when a user specifi
> nit: is used when a user specifies a datetime format string
Done


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.h@106
PS9, Line 106: /// with 'tz_offset' returns -1 if 't' is negative, +1 if 't' is greater than or equals
             :   /// to 24 or zero otherwise.
> Please add a comment explaining the return value.
Done


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.cc@174
PS9, Line 174: return false;
> Any reason the call to IndicateTimestampParseFailure() was removed?
The caller of this function will call it instead. This one populates the results and indicates failure and then the caller can decide what to do with it.


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.cc@263
PS9, Line 263: break;
> This is potentially dangerous: 'indicator' object is destructed after leavi
I made some const string in the parser common and use them here. What do you think?


http://gerrit.cloudera.org:8080/#/c/13722/9/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/13722/9/fe/src/main/cup/sql-parser.cup@2938
PS9, Line 2938: STRING_LITERA
> My understanding is that to_timestamp() and from_timestamp() accept format 
It is intentional. Extended the commit msg.


http://gerrit.cloudera.org:8080/#/c/13722/9/fe/src/main/java/org/apache/impala/analysis/CastExpr.java
File fe/src/main/java/org/apache/impala/analysis/CastExpr.java:

http://gerrit.cloudera.org:8080/#/c/13722/9/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@227
PS9, Line 227: his is a cast between a
             :     // datetime and a string.
> this is a 'datetime to string' or 'string to datetime' cast.
Rephrased a bit but done.


http://gerrit.cloudera.org:8080/#/c/13722/9/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@229
PS9, Line 229: 
             :         (type_.isDateOrTimeType() && getChild(0).getType().isStringType() ||
> Consider rewriting this condition to test directly if we have datetime->str
Done


http://gerrit.cloudera.org:8080/#/c/13722/9/testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
File testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test:

http://gerrit.cloudera.org:8080/#/c/13722/9/testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test@109
PS9, Line 109: ====
> Would it be possible to specify the format string as a string/char/varchar 
As mentioned a separate comment the FORMAT clause can only be a string literal but not a column.



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Jul 2019 08:20:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3848/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Jul 2019 10:35:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13722

to look at the new patch set (#14).

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................

IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat. Contrary to how these
functions work the FORMAT clause must specify a string literal and
cannot be used with any other kind of a string expression.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month
- DD: Day
- DDD: Day of year
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute
- SS: Second
- SSSSS: Second of day
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour
- TZM: Timezone minute
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-format-expr.cc
A be/src/exprs/cast-format-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/date-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/date-parse-util.cc
M be/src/runtime/date-parse-util.h
M be/src/runtime/date-test.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
A be/src/runtime/datetime-iso-sql-format-parser.cc
A be/src/runtime/datetime-iso-sql-format-parser.h
A be/src/runtime/datetime-iso-sql-format-tokenizer.cc
A be/src/runtime/datetime-iso-sql-format-tokenizer.h
D be/src/runtime/datetime-parse-util.h
A be/src/runtime/datetime-parser-common.cc
A be/src/runtime/datetime-parser-common.h
R be/src/runtime/datetime-simple-date-format-parser.cc
A be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/testutil/random-vector-generators.h
M be/src/util/dict-test.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/string-parser.h
M common/thrift/Exprs.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
M testdata/workloads/functional-query/queries/QueryTest/date.test
A tests/query_test/test_cast_with_format.py
54 files changed, 3,425 insertions(+), 865 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13722/14
-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 14
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 20:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/exprs/timestamp-functions-ir.cc@73
PS20, Line 73: buf
Maybe this should be called 'buff' to match the formal parameter name in the Format() function.


http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/date-parse-util.cc
File be/src/runtime/date-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/date-parse-util.cc@135
PS20, Line 135: string& buff
'buff' should either be a string* param or the return value of Format() to emphasize that it is an out param/value.


http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/date-parse-util.cc@138
PS20, Line 138: DCHECK(buff.empty());
This probably not necessary. Format should work even if string is not empty on call.


http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/date-parse-util.cc@180
PS20, Line 180: buff.append(string(str_val, str_val_len));
No need to create a temp string, you can append char* directly:
buff.append(str_val, str_val_len)


http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/date-parse-util.cc@180
PS20, Line 180: buff.append(string(str_val, str_val_len));
I'm not sure what happens if str_val == nullptr and str_val_len == 0.

Maybe we should check for that before calling append() just to be safe?


http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/date-value.cc
File be/src/runtime/date-value.cc:

http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/date-value.cc@86
PS20, Line 86: string& buff
Again, 'buff' should be a string* param or the return value of Format().


http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@113
PS20, Line 113: DCHECK(curr_token_size > 0);
nit: Probably not necessary, L110 and L111 implies that his is true.


http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/timestamp-parse-util.cc@218
PS20, Line 218: void TimestampParser::Format(const DateTimeFormatContext& dt_ctx,
              :     const date& d, const time_duration& t, string& buff) {
              :   DCHECK(dt_ctx.toks.size() > 0);
              :   DCHECK(buff.length() == 0);
              :   if (dt_ctx.has_date_toks && d.is_special()) return;
              :   if (dt_ctx.has_time_toks && t.is_special()) return;
              :   for (const DateTimeFormatToken& tok: dt_ctx.toks) {
              :     int32_t num_val = -1;
              :     const char* str_val = NULL;
              :     int str_val_len = 0;
              :     switch (tok.type) {
              :       case YEAR:
              :       case ROUND_YEAR: {
              :         num_val = d.year();
              :         if (tok.len < 4) {
              :           int adjust_factor = std::pow(10, tok.len);
              :           num_val %= adjust_factor;
              :         }
              :         break;
              :       }
              :       case MONTH_IN_YEAR: num_val = d.month().as_number(); break;
              :       case MONTH_IN_YEAR_SLT: {
              :         str_val = d.month().as_short_string();
              :         str_val_len = 3;
              :         break;
              :       }
              :       case DAY_IN_MONTH: num_val = d.day(); break;
              :       case DAY_IN_YEAR: {
              :         num_val = GetDayInYear(d.year(), d.month(), d.day());
              :         break;
              :       }
              :       case HOUR_IN_DAY: num_val = t.hours(); break;
              :       case HOUR_IN_HALF_DAY: {
              :         num_val = t.hours();
              :         if (num_val == 0) num_val = 12;
              :         if (num_val > 12) num_val -= 12;
              :         break;
              :       }
              :       case MERIDIEM_INDICATOR: {
              :         const MERIDIEM_INDICATOR_TEXT* indicator_txt = (tok.len == 2) ? &AM : &AM_LONG;
              :         if (t.hours() >= 12) {
              :           indicator_txt = (tok.len == 2) ? &PM : &PM_LONG;
              :         }
              :         str_val_len = tok.len;
              :         str_val = (isupper(*tok.val)) ? indicator_txt->first : indicator_txt->second;
              :         break;
              :       }
              :       case MINUTE_IN_HOUR: num_val = t.minutes(); break;
              :       case SECOND_IN_MINUTE: num_val = t.seconds(); break;
              :       case SECOND_IN_DAY: {
              :           num_val = t.hours() * 3600 + t.minutes() * 60 + t.seconds();
              :           break;
              :       }
              :       case FRACTION: {
              :         num_val = t.fractional_seconds();
              :         if (num_val > 0) for (int j = tok.len; j < 9; ++j) num_val /= 10;
              :         break;
              :       }
              :       case SEPARATOR:
              :       case ISO8601_TIME_INDICATOR:
              :       case ISO8601_ZULU_INDICATOR: {
              :         str_val = tok.val;
              :         str_val_len = tok.len;
              :         break;
              :       }
              :       case TZ_OFFSET: {
              :         break;
              :       }
              :       default: DCHECK(false) << "Unknown date/time format token";
              :     }
              :     if (num_val > -1) {
              :       string tmp_str = std::to_string(num_val);
              :       if (tmp_str.length() < tok.len) tmp_str.insert(0, tok.len - tmp_str.length(), '0');
              :       buff.append(tmp_str);
              :     } else {
              :       buff.append(string(str_val, str_val_len));
              :     }
              :   }
Same as comments on DateParser::Format().


http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/timestamp-value.cc@82
PS20, Line 82: string& buff
Again, buff should be a string* param or the return value of Format() to emphasize that it is on out param/value.



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 20
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 17 Sep 2019 13:53:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13722

to look at the new patch set (#6).

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................

IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month
- DD: Day
- DDD: Day of year
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute
- SS: Second
- SSSSS: Second of day
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour
- TZM: Timezone minute
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-expr.cc
A be/src/exprs/cast-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/date-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/date-parse-util.cc
M be/src/runtime/date-parse-util.h
M be/src/runtime/date-test.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
A be/src/runtime/datetime-iso-sql-format-parser.cc
A be/src/runtime/datetime-iso-sql-format-parser.h
A be/src/runtime/datetime-iso-sql-format-tokenizer.cc
A be/src/runtime/datetime-iso-sql-format-tokenizer.h
D be/src/runtime/datetime-parse-util.h
A be/src/runtime/datetime-parser-common.cc
A be/src/runtime/datetime-parser-common.h
R be/src/runtime/datetime-simple-date-format-parser.cc
A be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/testutil/random-vector-generators.h
M be/src/util/dict-test.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/string-parser.h
M common/thrift/Exprs.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
M testdata/workloads/functional-query/queries/QueryTest/date.test
A tests/query_test/test_cast_with_format.py
54 files changed, 3,363 insertions(+), 856 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13722/6
-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 13:

(39 comments)

http://gerrit.cloudera.org:8080/#/c/13722/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13722/10//COMMIT_MSG@17
PS10, Line 17: must specify a string literal and
             : cannot be used with any other kind of a stri
> nit: must specify a string literal and cannot be used with any other kind o
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/exprs/cast-expr.h
File be/src/exprs/cast-expr.h:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/exprs/cast-expr.h@1
PS10, Line 1: 
> Source file should be renamed to cast-format-expr.h
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/exprs/cast-expr.h@26
PS10, Line 26: 
> nit: /// here and below.
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/exprs/cast-expr.h@44
PS10, Line 44: 
> Does 'dt_ctx_' have to be a pointer?
It has to exist on the heap as the way to pass any input to functions such as CastFunctions::CastToTimestampVal() is to store the pointer of the data using fn_ctx->SetFunctionState().


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/exprs/cast-expr.cc
File be/src/exprs/cast-expr.cc:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/exprs/cast-expr.cc@30
PS10, Line 30: 
> nit: DCHECK(eval != nullptr);
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/exprs/cast-functions-ir.cc
File be/src/exprs/cast-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/exprs/cast-functions-ir.cc@182
PS10, Line 182:    char buf[buf_len];
              :     int ret_val = tv.Format(*format_ctx, buf_len, buf);
> Maybe instead of allocating 'buf' on the stack, we should allocate it on th
format length is maximized in the tokenizer:
const int IsoSqlFormatTokenizer::MAX_FORMAT_LENGTH = 100;
I'd keep this as it is.


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/exprs/cast-functions-ir.cc@204
PS10, Line 204:     char buf[buf_len];
              :     int ret_val = dv.Format(*format_ctx, buf_len, buf);
> Same as L182 above.
Same as above :)


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/date-parse-util.cc
File be/src/runtime/date-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/date-parse-util.cc@127
PS10, Line 127: oSqlFormatParser::Pa
> In DateParser::ParseSimpleDateFormat() dt_ctx.has_date_toks is DCHECKed at 
Initially I've set the has_date_toks flag in IsoSqlFormatParser::ParseDateTime() s I couldn't dcheck it at the beginning of the function. I moved setting that flag to the tokenizer so it's safe to do the DCHECK here. thanks for spotting!


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/date-parse-util.cc@152
PS10, Line 152: ar
> <
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-parser.h
File be/src/runtime/datetime-iso-sql-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-parser.h@71
PS10, Line 71: '**tok
> **tok ?
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-parser.cc@30
PS10, Line 30: 
> In other .cc files, we just include common/names.h which pulls in all the u
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-parser.cc@134
PS10, Line 134:        // Input has already been validated in ParseMeridiemIndicatorFromInput().
              :         string indicator(current_pos, token_len);
              :         boost::to_upper(indicator);
              :         if (in
> Add a comment that the token has already been validated in GetNextTokenFrom
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-parser.cc@145
PS10, Line 145:       case ISO8601_TIME_INDICATOR:
> Add DCHECK(token_len == 1)
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-parser.cc@146
PS10, Line 146: SO860
> std qualifier is not necessary, here and elsewhere in the .cc file.
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-parser.cc@172
PS10, Line 172: 
> Maybe 'current_tok_ind' ?
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-parser.cc@195
PS10, Line 195: 
              :   // The last '-' of a separator s
> Thanks for refactoring the algorithm and moving all the separator skipping 
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-parser.cc@205
PS10, Line 205: 
> This should be called 'FindEndOfToken' or something similar.
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-parser.cc@239
PS10, Line 239:        strncasecmp(input_str, PM_LONG.first, 4) == 0 )) {
              :     return input_str + 4;
              :   }
              :   if (input_len >= 2 &&
              :       (strncasecmp(input_str, AM.first, 2) == 0 ||
              :        strncasecmp(input_str, PM.first, 2) == 0 )) {
              :     return input_str + 2;
              :   }
              :   return nullptr;
              : }
> This looks a bit complicated for what the function does. 
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h
File be/src/runtime/datetime-iso-sql-format-tokenizer.h:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h@53
PS10, Line 53:   /// Returns true if 'c' is a valid separator
             :   static bool IsSeparator(char c);
             : private:
             :   /// Stores metadata about a specific token type.
             :   struct TokenItem {
> This function is only called in IsoSqlFormatParser::ParseMeridiemIndicatorF
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h@91
PS10, Line 91: 
> nit: we have
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@110
PS10, Line 110: DCHECK(*
> long?
Substracting 2 pointers results in a long. What if I convert explicitly that to an unsigned? Let's have a DCHECK above that str_end >= than *current_pos. This way curr_token_size can remain an unsigned.


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@110
PS10, Line 110: 
> Either use static_cast<long> or define MAX_TOKEN_SIZE as a long.
see below


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@118
PS10, Line 118: auto token = VALID_TOKENS.find(token_to_probe);
> nit: Reverse the order of && operands to prevent looking up the token if to
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@124
PS10, Line 124: ks |= token->s
> Maybe it would be simpler to keep track of the DateTimeFormatTokenType of t
See below.


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@124
PS10, Line 124: dt_ctx_->has_date_toks |= token->sec
> This probably not necessary if tokenizer is in FORMAT mode.
In format mode I still have to check that TZH and TZM are not provided. I could do it elsewhere than CheckIncompatibilities() but doing it there is the cleanest way. So for FORMAT mode I still have to keep track of the used tokens. But then if I inserted the TokenType instead of their string representation then in FORMAT mode user wouldn't be able to provide "YYYY" and "Y" as both of them are of type YEAR.


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-parser-common.h
File be/src/runtime/datetime-parser-common.h:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-parser-common.h@148
PS10, Line 148: token lengt
> token
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-parser-common.h@150
PS10, Line 150: tokens found
> token
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-parser-common.h@209
PS10, Line 209: token, int 
> token
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-parser-common.h@212
PS10, Line 212: token, int 
> token
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-parser-common.h@223
PS10, Line 223: sinc
> nit: since
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-parser-common.h@224
PS10, Line 224: if any of the in parameters are
> "if any of the in parameters are"
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-parser-common.h@225
PS10, Line 225: 'days_sinc
> "days_since_jan1 set to 365" ?
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-parser-common.cc
File be/src/runtime/datetime-parser-common.cc:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-parser-common.cc@123
PS10, Line 123: token, int 
> token
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-parser-common.cc@135
PS10, Line 135: token, int 
> token
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-simple-date-format-parser.h
File be/src/runtime/datetime-simple-date-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-simple-date-format-parser.h@126
PS10, Line 126: tokens.
              :   static Date
> tokens?
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-simple-date-format-parser.cc
File be/src/runtime/datetime-simple-date-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-simple-date-format-parser.cc@193
PS10, Line 193: token length
> token
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-simple-date-format-parser.cc@207
PS10, Line 207: tokens e.g. 
> tokens
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/timestamp-parse-util.cc@45
PS10, Line 45: static bool IndicateTimestampParseFailure(date* d, time_duration* t) {
> DCHECK(d != nulptr);
Done


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/timestamp-parse-util.cc@188
PS10, Line 188: ;
> nullptr
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 13
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 02 Aug 2019 11:04:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 4:

(12 comments)

Attila, with patch set 4 I have covered all your findings. The perf degradation is still an issue, though.

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@184
PS2, Line 184:         &result->day)) {
             :       return false;
             :     }
             :   }
             : 
             :   return true;
             : }
             : 
             : const char* DateTimeIsoSqlFormatParser::GetNextTokenGroupFromInput(const char* input_str,
             :     int input_len, const DateTimeFormatToken& tok) {
             :   DCHECK(input_str != nullptr);
             : 
             :   // Handle separately the meridiem indicators for two reasons.
             :   // 1: They might contain '.' that is not meant to be a separator character.
             :   // 2: The length of the token in the pattern might differ from the length of the token
             :   // in the input. E.g. "AM" should match with "P.M.".
             :   if (tok.type == MERIDIEM_IND
> This can be done more efficiently if you use hard-coded month ranges, like 
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@184
PS2, Line 184:         &result->day)) {
             :       return false;
             :     }
             :   }
             : 
             :   return true;
             : }
             : 
             : const char* DateTimeIsoSqlFormatParser::GetNextTokenGroupFromInput(const char* input_str,
             :     int input_len, const DateTimeFormatToken& tok) {
             :   DCHECK(input_str != nullptr);
             : 
             :   // Handle separately the meridiem indicators for two reasons.
             :   // 1: They might contain '.' that is not meant to be a separator character.
             :   // 2: The length of the token in the pattern might differ from the length of the token
             :   // in the input. E.g. "AM" should match with "P.M.".
             :   if (tok.type == MERIDIEM_IND
> Move this to a separate function.
Done


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@78
PS3, Line 78:         if (!ParseAndValidate(current_pos, group_len, 0, 9999, &result->year)) {
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@95
PS3, Line 95:       }
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@124
PS3, Line 124:         if (!ParseAndValidate(current_pos, group_len, 0, 23, &result->hour)) return false;
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@128
PS3, Line 128:         if (!ParseAndValidate(current_pos, group_len, 0, 59, &result->minute)) {
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@133
PS3, Line 133:       case SECOND_IN_MINUTE: {
> line too long (94 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.cc
File be/src/runtime/datetime-parser-common.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.cc@185
PS2, Line 185: 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
> This function can be simplified if you use hard-coded month ranges. (See MO
Done


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-simple-date-format-parser.h
File be/src/runtime/datetime-simple-date-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-simple-date-format-parser.h@98
PS3, Line 98:   /// accept_time_toks_only -- if true, time tokens w/o date tokens are accepted.
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-simple-date-format-parser.h@104
PS3, Line 104:   /// Parse date/time string to find the corresponding default date/time format context.
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-simple-date-format-parser.h@160
PS3, Line 160:   /// Does only a basic validation on the parsed date/time values. The caller is
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-simple-date-format-parser.h
File be/src/runtime/datetime-simple-date-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-simple-date-format-parser.h@68
PS2, Line 68: ///   Long literal months e.g. MMMM
> Maybe the stuff in this file could be moved to separate classes, like Simpl
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Jul 2019 09:37:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3888/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Jul 2019 08:26:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 22:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/exprs/timestamp-functions-ir.cc@73
PS20, Line 73: for
> Maybe this should be called 'buff' to match the formal parameter name in th
Done


http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/date-parse-util.cc
File be/src/runtime/date-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/date-parse-util.cc@135
PS20, Line 135: HECK(dt_ctx.
> 'buff' should either be a string* param or the return value of Format() to 
Done


http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/date-parse-util.cc@138
PS20, Line 138: int year, month, day;
> This probably not necessary. Format should work even if string is not empty
Done


http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/date-parse-util.cc@180
PS20, Line 180: result.append(str_val, str_val_len);
> I'm not sure what happens if str_val == nullptr and str_val_len == 0.
it can't be nullptr, but I added a dcheck just in case.


http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/date-parse-util.cc@180
PS20, Line 180: result.append(str_val, str_val_len);
> No need to create a temp string, you can append char* directly:
Done


http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/date-value.cc
File be/src/runtime/date-value.cc:

http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/date-value.cc@86
PS20, Line 86: ) const {
> Again, 'buff' should be a string* param or the return value of Format().
Done


http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@113
PS20, Line 113: string token_to_probe(*curre
> nit: Probably not necessary, L110 and L111 implies that his is true.
Done


http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/timestamp-parse-util.cc@218
PS20, Line 218: string TimestampParser::Format(const DateTimeFormatContext& dt_ctx,
              :     const date& d, const time_duration& t) {
              :   DCHECK(dt_ctx.toks.size() > 0);
              :   if (dt_ctx.has_date_toks && d.is_special()) return "";
              :   if (dt_ctx.has_time_toks && t.is_special()) return "";
              :   string result;
              :   for (const DateTimeFormatToken& tok: dt_ctx.toks) {
              :     int32_t num_val = -1;
              :     const char* str_val = NULL;
              :     int str_val_len = 0;
              :     switch (tok.type) {
              :       case YEAR:
              :       case ROUND_YEAR: {
              :         num_val = d.year();
              :         if (tok.len < 4) {
              :           int adjust_factor = std::pow(10, tok.len);
              :           num_val %= adjust_factor;
              :         }
              :         break;
              :       }
              :       case MONTH_IN_YEAR: num_val = d.month().as_number(); break;
              :       case MONTH_IN_YEAR_SLT: {
              :         str_val = d.month().as_short_string();
              :         str_val_len = 3;
              :         break;
              :       }
              :       case DAY_IN_MONTH: num_val = d.day(); break;
              :       case DAY_IN_YEAR: {
              :         num_val = GetDayInYear(d.year(), d.month(), d.day());
              :         break;
              :       }
              :       case HOUR_IN_DAY: num_val = t.hours(); break;
              :       case HOUR_IN_HALF_DAY: {
              :         num_val = t.hours();
              :         if (num_val == 0) num_val = 12;
              :         if (num_val > 12) num_val -= 12;
              :         break;
              :       }
              :       case MERIDIEM_INDICATOR: {
              :         const MERIDIEM_INDICATOR_TEXT* indicator_txt = (tok.len == 2) ? &AM : &AM_LONG;
              :         if (t.hours() >= 12) {
              :           indicator_txt = (tok.len == 2) ? &PM : &PM_LONG;
              :         }
              :         str_val_len = tok.len;
              :         str_val = (isupper(*tok.val)) ? indicator_txt->first : indicator_txt->second;
              :         break;
              :       }
              :       case MINUTE_IN_HOUR: num_val = t.minutes(); break;
              :       case SECOND_IN_MINUTE: num_val = t.seconds(); break;
              :       case SECOND_IN_DAY: {
              :           num_val = t.hours() * 3600 + t.minutes() * 60 + t.seconds();
              :           break;
              :       }
              :       case FRACTION: {
              :         num_val = t.fractional_seconds();
              :         if (num_val > 0) for (int j = tok.len; j < 9; ++j) num_val /= 10;
              :         break;
              :       }
              :       case SEPARATOR:
              :       case ISO8601_TIME_INDICATOR:
              :       case ISO8601_ZULU_INDICATOR: {
              :         str_val = tok.val;
              :         str_val_len = tok.len;
              :         break;
              :       }
              :       case TZ_OFFSET: {
              :         break;
              :       }
              :       default: DCHECK(false) << "Unknown date/time format token";
              :     }
              :     if (num_val > -1) {
              :       string tmp_str = std::to_string(num_val);
              :       if (tmp_str.length() < tok.len) tmp_str.insert(0, tok.len - tmp_str.length(), '0');
              :       result.append(tmp_str);
              :     } else {
              :       DCHECK(str_val != nullptr && str_val_len > 0);
              :       result.append(str_val, str_val_len);
              :    
> Same as comments on DateParser::Format().
Done


http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/13722/20/be/src/runtime/timestamp-value.cc@82
PS20, Line 82:  const {
> Again, buff should be a string* param or the return value of Format() to em
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 22
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Sep 2019 08:47:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13722/8/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/13722/8/tests/query_test/test_cast_with_format.py@120
PS8, Line 120: d
flake8: E303 too many blank lines (2)



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Jul 2019 07:47:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13722/15/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/15/be/src/runtime/datetime-iso-sql-format-parser.cc@144
PS15, Line 144: 0, 99
> The sign before TIMEZONE_HOUR is processed in FindEndOfToken(). See L226.
Ok, I checked out your change and run some tests. This works:
> select cast('2018-12-31 08:00 1' as timestamp FORMAT 'YYYY-MM-DD HH24:MI TZH');
2018-12-31 08:00:00

But this doesn't:
> select cast('2018-12-31 08:00 -1' as timestamp FORMAT 'YYYY-MM-DD HH24:MI TZH');
NULL


Am I missing something?



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 15
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Sep 2019 13:28:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 15:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13722/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13722/15//COMMIT_MSG@25
PS15, Line 25: - YYYY, YYY, YY, Y: Year tokens
             : - RRRR, RR: Round year tokens
             : - MM: Month (1-12)
             : - DD: Day (1-31)
             : - DDD: Day of year (1-366)
             : - HH, HH12: Hour of day (1-12)
             : - HH24: Hour of day (0-23)
             : - MI: Minute (0-59)
             : - SS: Second (0-59)
             : - SSSSS: Second of day (0-86399)
             : - FF, FF1, ..., FF9: Fractional second
             : - AM, PM, A.M., P.M.: Meridiem indicators
             : - TZH: Timezone hour (-99-+99)
             : - TZM: Timezone minute (0-99)
> Thanks for adding these. Are there any tests  for these lower/upper limits?
I haven't created a separate test for the boundary checks but most of these boundaries are exercised.


http://gerrit.cloudera.org:8080/#/c/13722/15/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/15/be/src/runtime/datetime-iso-sql-format-parser.cc@144
PS15, Line 144: 0, 99
> For TIMEZONE_MIN the valid range is 0...99, but for TIMEZONE_HOUR it should
The sign before TIMEZONE_HOUR is processed in FindEndOfToken(). See L226.
Some of the timezone tests cover negative TZH cases in test_timezone_offsets().


http://gerrit.cloudera.org:8080/#/c/13722/15/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/13722/15/tests/query_test/test_cast_with_format.py@502
PS15, Line 502: 01:-59
> Is this is parsed because ":-" characters in the input string are matched t
Yeah, this is a bit misleading because the minus sign before TZM is parsed as a separator together with the preceding colon. The reason I think this can stay is that the timezone offset values are ignored anyway so it doesn't matter if this TZM is taken as +59 or -59.



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 15
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 29 Aug 2019 15:56:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................

IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat. Contrary to how these
functions work the FORMAT clause must specify a string literal and
cannot be used with any other kind of a string expression.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month (1-12)
- DD: Day (1-31)
- DDD: Day of year (1-366)
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute (0-59)
- SS: Second (0-59)
- SSSSS: Second of day (0-86399)
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour (-99-+99)
- TZM: Timezone minute (0-99)
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Reviewed-on: http://gerrit.cloudera.org:8080/13722
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-format-expr.cc
A be/src/exprs/cast-format-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/date-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/date-parse-util.cc
M be/src/runtime/date-parse-util.h
M be/src/runtime/date-test.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
A be/src/runtime/datetime-iso-sql-format-parser.cc
A be/src/runtime/datetime-iso-sql-format-parser.h
A be/src/runtime/datetime-iso-sql-format-tokenizer.cc
A be/src/runtime/datetime-iso-sql-format-tokenizer.h
D be/src/runtime/datetime-parse-util.h
A be/src/runtime/datetime-parser-common.cc
A be/src/runtime/datetime-parser-common.h
R be/src/runtime/datetime-simple-date-format-parser.cc
A be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/testutil/random-vector-generators.h
M be/src/util/dict-test.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/string-parser.h
M common/thrift/Exprs.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
M testdata/workloads/functional-query/queries/QueryTest/date.test
A tests/query_test/test_cast_with_format.py
54 files changed, 3,521 insertions(+), 972 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 25
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13722

to look at the new patch set (#19).

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................

IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat. Contrary to how these
functions work the FORMAT clause must specify a string literal and
cannot be used with any other kind of a string expression.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month (1-12)
- DD: Day (1-31)
- DDD: Day of year (1-366)
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute (0-59)
- SS: Second (0-59)
- SSSSS: Second of day (0-86399)
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour (-99-+99)
- TZM: Timezone minute (0-99)
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-format-expr.cc
A be/src/exprs/cast-format-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/date-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/date-parse-util.cc
M be/src/runtime/date-parse-util.h
M be/src/runtime/date-test.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
A be/src/runtime/datetime-iso-sql-format-parser.cc
A be/src/runtime/datetime-iso-sql-format-parser.h
A be/src/runtime/datetime-iso-sql-format-tokenizer.cc
A be/src/runtime/datetime-iso-sql-format-tokenizer.h
D be/src/runtime/datetime-parse-util.h
A be/src/runtime/datetime-parser-common.cc
A be/src/runtime/datetime-parser-common.h
R be/src/runtime/datetime-simple-date-format-parser.cc
A be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/testutil/random-vector-generators.h
M be/src/util/dict-test.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/string-parser.h
M common/thrift/Exprs.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
M testdata/workloads/functional-query/queries/QueryTest/date.test
A tests/query_test/test_cast_with_format.py
54 files changed, 3,461 insertions(+), 865 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13722/19
-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 19
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/13722/4/be/src/runtime/datetime-parser-common.cc
File be/src/runtime/datetime-parser-common.cc:

http://gerrit.cloudera.org:8080/#/c/13722/4/be/src/runtime/datetime-parser-common.cc@160
PS4, Line 160: GetMonthAndDayFromDayOfYear
Technically this should be called 'GetMonthAndDayFromDaysSinceJan1'


http://gerrit.cloudera.org:8080/#/c/13722/4/be/src/runtime/datetime-parser-common.cc@161
PS4, Line 161:   // Calculate month using month ranges and the average month length.
DCHECK(days_since_jan1 >= 0 && days_since_jan1 < 366);
DCHECK(month != nullptr);
DCHECK(day != nullptr);


http://gerrit.cloudera.org:8080/#/c/13722/5/be/src/runtime/datetime-simple-date-format-parser.h
File be/src/runtime/datetime-simple-date-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/13722/5/be/src/runtime/datetime-simple-date-format-parser.h@73
PS5, Line 73: DateTimeSimpleDateFormatTokenizer
The name is a bit redundant. 'SimpleDateFormatTokenizer' would be better as the name of the enclosing namespace is also prefixed with 'datetime'.


http://gerrit.cloudera.org:8080/#/c/13722/5/be/src/runtime/datetime-simple-date-format-parser.h@156
PS5, Line 156: DateTimeSimpleDateFormatParser
Same as above.



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Jul 2019 09:06:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 14: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 14
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Aug 2019 14:12:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 9:

(27 comments)

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-expr.h
File be/src/exprs/cast-expr.h:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-expr.h@28
PS9, Line 28: // Note, that it should be verified at the callsite that TExprNode.cast_expr is set.
nit: Move this comment in front of L31.


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-functions-ir.cc
File be/src/exprs/cast-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-functions-ir.cc@171
PS9, Line 171:   if (val.is_null) return StringVal::null();
Add DCHECK(ctx != nullptr);


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-functions-ir.cc@178
PS9, Line 178: lexical_cast<string>(tv)
It would be better to use ToString() directly rather than involving the indirection of lexical_cast and operator<<. (just like it's done in L199).


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-functions-ir.cc@191
PS9, Line 191:   if (val.is_null) return StringVal::null();
Add DCHECK(ctx != nullptr);


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-functions-ir.cc@347
PS9, Line 347:   if (val.is_null) return DateVal::null();
Add DCHECK(ctx != nullptr);


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.h
File be/src/runtime/datetime-iso-sql-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.h@28
PS9, Line 28: //
nit: in most other header files comments are prefixed with ///


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.h@46
PS9, Line 46: index
position


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.h@47
PS9, Line 47: token group
It hasn't been properly defined what "token group" means. Whst's the difference between a token and a token group?


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.h@48
PS9, Line 48: GetNextTokenGroupFromInput
Naming is a bit confusing as it is used to find the end of the current token (pointed to by input_str).

Again, instead of "TokenGroup", you might want to just use "Token" in the wording.


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@116
PS3, Line 116: // Note the addition 
> My assumption was that it is initialized to zero when the 'result' is creat
Maybe add some DCHECKs at the beginning of the function then?


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.cc@47
PS9, Line 47:      if (!IsoSqlFormatTokenizer::IsSeparator(*current_pos)) return false;
            :       // Advance to the end of the separator sequence.
            :       ++current_pos;
            :       while (current_pos < end_pos && IsoSqlFormatTokenizer::IsSeparator(*current_pos)) {
            :         ++current_pos;
            :       }
            :       // Advance to the end of the separator sequence in the expected tokens list.
            :       ++i;
            :       while (i < dt_ctx.toks.size() && dt_ctx.toks[i].type == SEPARATOR) ++i;
            : 
            :       // If we reached the end of input or the end of token sequence, we can return.
            :       if (current_pos >= end_pos || i >= dt_ctx.toks.size()) {
            :         return (current_pos >= end_pos && i >= dt_ctx.toks.size());
            :       }
            : 
            :       // Next token, following the separator sequence.
            :       tok = &dt_ctx.toks[i];
            : 
            :       // The last '-' of a separator sequence might be taken as a sign for timezone hour.
            :       if (*(current_pos - 1) == '-' && tok->type == TIMEZONE_HOUR) {
            :         --current_pos;
            :       }
Please consider moving this block to a separate function.


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.h
File be/src/runtime/datetime-iso-sql-format-tokenizer.h:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.h@73
PS9, Line 73: judge
nit: decide


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.h@75
PS9, Line 75: token group
Again, the wording here and below is a bit confusing. Shouldn't it just be "token"?


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.h@91
PS9, Line 91: observed
nit: found


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@110
PS9, Line 110: unsigned
I think it would be safer to just use 'int' here.


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@110
PS9, Line 110: (long)MAX_TOKEN_SIZE
Is explicit cast to long necessary here?


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-parser-common.h
File be/src/runtime/datetime-parser-common.h:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-parser-common.h@204
PS9, Line 204: bool ParseAndValidate(const char* token_group, int token_len, int min, int max,
             :     int* result);
             : 
             : bool ParseFractionTokenGroup(const char* token_group, int token_len,
             :     DateTimeParseResult* result);
Add WARN_UNUSED_RESULT to these declarations.


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-parser-common.cc
File be/src/runtime/datetime-parser-common.cc:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-parser-common.cc@64
PS9, Line 64: const StringValue& fmt = StringValue::FromStringVal(format);
nit: move this after L112.


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.h@48
PS9, Line 48: static bool ParseSimpleDateFormat(const char* str, int len, boost::gregorian::date* d,
            :       boost::posix_time::time_duration* t);
Add WARN_UNUSED_RESULT here and L60, L69.


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.h@65
PS9, Line 65: run when a user provided
nit: is used when a user specifies a datetime format string


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.h@106
PS9, Line 106: static int AdjustWithTimezone(boost::posix_time::time_duration* t,
             :       const boost::posix_time::time_duration& tz_offset);
Please add a comment explaining the return value.


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.cc@174
PS9, Line 174: return false;
Any reason the call to IndicateTimestampParseFailure() was removed?


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.cc@263
PS9, Line 263: str_val = indicator.c_str();
This is potentially dangerous: 'indicator' object is destructed after leaving the current L256-L265 block.


http://gerrit.cloudera.org:8080/#/c/13722/9/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/13722/9/fe/src/main/cup/sql-parser.cup@2938
PS9, Line 2938: STRING_LITERA
My understanding is that to_timestamp() and from_timestamp() accept format strings specified as STRING/CHAR/VARCHAR columns of a table.

The FORMAT clause accepts string literals only? Is this limitation intentional or an oversight? If it is intentional, please comment on it in the commit message.


http://gerrit.cloudera.org:8080/#/c/13722/9/fe/src/main/java/org/apache/impala/analysis/CastExpr.java
File fe/src/main/java/org/apache/impala/analysis/CastExpr.java:

http://gerrit.cloudera.org:8080/#/c/13722/9/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@227
PS9, Line 227: his is not an intermediate
             :     // step between a char and a datetime.
this is a 'datetime to string' or 'string to datetime' cast.


http://gerrit.cloudera.org:8080/#/c/13722/9/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@229
PS9, Line 229: (type_.getPrimitiveType() != PrimitiveType.CHAR ||
             :             children_.get(0).getType().getPrimitiveType() != PrimitiveType.STRING)
Consider rewriting this condition to test directly if we have datetime->string or string->datetime cast. That might make the code easier to understand.


http://gerrit.cloudera.org:8080/#/c/13722/9/testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
File testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test:

http://gerrit.cloudera.org:8080/#/c/13722/9/testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test@109
PS9, Line 109: ====
Would it be possible to specify the format string as a string/char/varchar column of table?

Please add some tests to test these scenarios.



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Jul 2019 11:07:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13722

to look at the new patch set (#4).

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................

IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month
- DD: Day
- DDD: Day of year
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute
- SS: Second
- SSSSS: Second of day
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour
- TZM: Timezone minute
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-expr.cc
A be/src/exprs/cast-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/date-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/date-parse-util.cc
M be/src/runtime/date-parse-util.h
M be/src/runtime/date-test.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
A be/src/runtime/datetime-iso-sql-format-parser.cc
A be/src/runtime/datetime-iso-sql-format-parser.h
A be/src/runtime/datetime-iso-sql-format-tokenizer.cc
A be/src/runtime/datetime-iso-sql-format-tokenizer.h
D be/src/runtime/datetime-parse-util.h
A be/src/runtime/datetime-parser-common.cc
A be/src/runtime/datetime-parser-common.h
R be/src/runtime/datetime-simple-date-format-parser.cc
A be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/testutil/random-vector-generators.h
M be/src/util/dict-test.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/string-parser.h
M common/thrift/Exprs.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
M testdata/workloads/functional-query/queries/QueryTest/date.test
A tests/query_test/test_cast_with_format.py
54 files changed, 3,340 insertions(+), 853 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13722/4
-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 2:

(24 comments)

Another bunch of comments.

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@184
PS2, Line 184:     DCHECK(result->year >= 1400 && result->year <= 9999);
             :     int month = 1;
             :     for (int i = 0; i < 12; ++i) {
             :       int month_length = MONTH_LENGTHS[i];
             :       if (i == 1 && boost::gregorian::gregorian_calendar::is_leap_year(result->year)) {
             :         month_length = 29;
             :       }
             :       if (month_length >= day_in_year) {
             :         result->month = month;
             :         result->day = day_in_year;
             :         break;
             :       }
             :       ++month;
             :       day_in_year -= month_length;
             :     }
             :     DCHECK(result->month <= 12);
             :     DCHECK(result->day <= 31);
> Move this to a separate function.
This can be done more efficiently if you use hard-coded month ranges, like MONTH_RANGES and LEAP_YEAR_MONTH_RANGES in be/src/runtime/date-value.cc.

See be/src/runtime/date-value.cc, L193-L208 for an example of how day_in_year -> (month, day) conversion is done.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.h
File be/src/runtime/datetime-iso-sql-format-tokenizer.h:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.h@42
PS2, Line 42: int
bool?


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.h@45
PS2, Line 45:     accept_time_toks_ = time_toks;
Shouldn't 'used_tokens_' be cleared when tokenizer is reset?


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@87
PS2, Line 87: used_tokens_.clear();
Any reason 'used_tokens_' is cleared here and not in Reset()?


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@93
PS2, Line 93: current_pos
In some functions this is called 'current' while in others it is called 'current_pos'. Please use the same name everywhere.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@94
PS2, Line 94: current_pos != str_end
current_pos < str_end


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@96
PS2, Line 96: return SUCCESS
This should be break, otherwise the check in L101 won't be executed.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@104
PS2, Line 104: void DateTimeIsoSqlFormatTokenizer::ProcessSeparators(const char** current) {
DCHECK(*current != nullptr);


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@107
PS2, Line 107: IsSeparator(*current) && *current <= str_end
I think the condition should be:

(*current < str_end && IsSeparator(*current))

to avoid buffer over-read.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@115
PS2, Line 115:     const char** current_pos) {
DCHECK(*current_pos != nullptr);


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@120
PS2, Line 120:   string curr_token(*current_pos, curr_token_size);
             :     boost::to_upper(curr_token);
This should be done before the loop starts. No need to convert every prefix-string to uppercase.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@204
PS2, Line 204: 
I think, you should also check that no more than one fractional tokens were specified.

It would be better to do this check more generally: loop through dt_ctx_->toks and check that there are no duplicate DateTimeFormatTokenType values in the TokenItem items. You can also do this check in ProcessNextToken() similarly to L124.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@218
PS2, Line 218: const char* c
We can pass the char directly. No need for a pointer.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.h
File be/src/runtime/datetime-parser-common.h:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.h@128
PS2, Line 128: results longer output length than the format string.
nit: produces output that is longer than the format string.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.h@129
PS2, Line 129: then
nit: from


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.h@157
PS2, Line 157:   
nit: 1 space


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.cc
File be/src/runtime/datetime-parser-common.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.cc@46
PS2, Line 46: }
Add DCHECK(!century_break_time.is_special())


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.cc@61
PS2, Line 61:   std::stringstream ss;
DCHECK(context != nullptr);


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.cc@120
PS2, Line 120:   StringParser::ParseResult status;
Here and in the functions below add:
DCHECK(token_group != nullptr);
DCHECK(token_len > 0);
DCHECK(result != nullptr);


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.cc@120
PS2, Line 120:   StringParser::ParseResult status;
             :   result->year = StringParser::StringToInt<int>(token_group, token_len, &status);
             :   if (UNLIKELY(StringParser::PARSE_SUCCESS != status)) return false;
             :   if (UNLIKELY(result->year < 0 || result->year > 9999)) return false;
             :   return true;
All the Construct.. functions are very similar, only the valid range differs. 
You could create a helper function to avoid code duplication:

bool ParseAndValidate(const char* s, int len, int min, int max, int* res) {..}


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.cc@181
PS2, Line 181: for (int i = token_len; i < 9; ++i) result->fraction *= 10;
Can you use std::pow() to simplify this?


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.cc@185
PS2, Line 185: int GetDayInYear(int year, int month, int day_in_month) {
             :   int day_in_year = day_in_month;
             :   for (int i = 0; i < month - 1; ++i) {
             :     int month_length = MONTH_LENGTHS[i];
             :     if (boost::gregorian::gregorian_calendar::is_leap_year(year) && i == 1) {
             :       month_length = 29;
             :     }
             :     day_in_year += month_length;
             :   }
             :   return day_in_year;
             : }
This function can be simplified if you use hard-coded month ranges. (See MONTH_RANGES and LEAP_YEAR_MONTH_RANGES in be/src/runtime/date-value.cc)


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-simple-date-format-parser.cc
File be/src/runtime/datetime-simple-date-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-simple-date-format-parser.cc@372
PS2, Line 372: GetDefaultFormatContext
Rename to GetDefaultSimpleDateFormatContext() ?


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-simple-date-format-parser.cc@488
PS2, Line 488:      case MONTH_IN_YEAR_SLT: {
             :         char raw_buff[tok.len];
             :         std::transform(tok_val, tok_val + tok.len, raw_buff, ::tolower);
             :         StringValue buff(raw_buff, tok.len);
             :         boost::unordered_map<StringValue, int>::const_iterator iter =
             :             REV_MONTH_INDEX.find(buff);
             :         if (UNLIKELY(iter == REV_MONTH_INDEX.end())) return false;
             :         dt_result->month = iter->second;
             :         break;
This should go to a separate function as well.



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 01 Jul 2019 17:28:02 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13722/7/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/13722/7/tests/query_test/test_cast_with_format.py@120
PS7, Line 120: d
flake8: E303 too many blank lines (2)



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 Jul 2019 12:14:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 9:

(3 comments)

I have just noticed you had 2 additional comments yesterday. Addressed them as well.

http://gerrit.cloudera.org:8080/#/c/13722/6/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/6/be/src/runtime/datetime-iso-sql-format-parser.cc@195
PS6, Line 195:   if (input_len == 0) return nullptr;
> if (input_len == 0) return nullptr;
Indeed. Done.


http://gerrit.cloudera.org:8080/#/c/13722/6/be/src/runtime/datetime-iso-sql-format-parser.cc@211
PS6, Line 211: 
             :   const char* end_pos = start_of_token;
             :   while (end_pos < start_of_token + max_tok_len &&
             :       !IsoSqlFormatTokenizer::IsSeparator(*end_pos)) {
             :     ++end_pos;
             :   }
> 'len' is not really necessary:
Done


http://gerrit.cloudera.org:8080/#/c/13722/8/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/13722/8/tests/query_test/test_cast_with_format.py@120
PS8, Line 120:  
> flake8: E303 too many blank lines (2)
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Jul 2019 07:51:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 7:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/13722/4/be/src/runtime/datetime-parser-common.cc
File be/src/runtime/datetime-parser-common.cc:

http://gerrit.cloudera.org:8080/#/c/13722/4/be/src/runtime/datetime-parser-common.cc@160
PS4, Line 160: GetMonthAndDayFromDaysSince
> Technically this should be called 'GetMonthAndDayFromDaysSinceJan1'
Done


http://gerrit.cloudera.org:8080/#/c/13722/4/be/src/runtime/datetime-parser-common.cc@161
PS4, Line 161:     int* day) {
> DCHECK(days_since_jan1 >= 0 && days_since_jan1 < 366);
Done


http://gerrit.cloudera.org:8080/#/c/13722/5/be/src/runtime/datetime-simple-date-format-parser.h
File be/src/runtime/datetime-simple-date-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/13722/5/be/src/runtime/datetime-simple-date-format-parser.h@73
PS5, Line 73: SimpleDateFormatTokenizer {
> The name is a bit redundant. 'SimpleDateFormatTokenizer' would be better as
yeah, it felt a bit overwhelming for me as well. I'm happy to shorten it a little bit. Done


http://gerrit.cloudera.org:8080/#/c/13722/5/be/src/runtime/datetime-simple-date-format-parser.h@156
PS5, Line 156: SimpleDateFormatParser {
> Same as above.
Done


http://gerrit.cloudera.org:8080/#/c/13722/5/fe/src/main/java/org/apache/impala/analysis/CastExpr.java
File fe/src/main/java/org/apache/impala/analysis/CastExpr.java:

http://gerrit.cloudera.org:8080/#/c/13722/5/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@201
PS5, Line 201: targetTypeDef_ and cas
> 'targetTypeDef_' and 'castFormat_' are null.
Done


http://gerrit.cloudera.org:8080/#/c/13722/5/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@267
PS5, Line 267: twoStepCastNeeded =
> I find the naming here is a bit confusing.
Indeed, it's a bit better name. Done


http://gerrit.cloudera.org:8080/#/c/13722/5/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@278
PS5, Line 278:       children_.set(0, tostring);
> Probably 'castFormat_' should be set to null after this. In the STRING -> C
I tried this but apparently this breaks the tests for casts between datetimes and char. The reason is that analyze() for some reason is called multiple times:
 - For the first run everything is as expected: castFormat_ is null for string vs char conversion and it contains the desired value for the timestamp vs string step.
 - For the second run and onwards castFormat_ is null for both steps for the following reason: The second analyze() step is called on the parent node of the cast that is the string vs char step where the castFormat_ is null. Then it creates the intermediate step for timestamp vs string and passes it's castFormat_ but it's null so the intermediate step will have it as null as well.

I remembered that there was a reason why I implemented it this way without reseting it for the string vs char node but I couldn't recall it immediately :)


http://gerrit.cloudera.org:8080/#/c/13722/5/testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
File testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test:

http://gerrit.cloudera.org:8080/#/c/13722/5/testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test@109
PS5, Line 109: ====
> Please make sure that the following scenarios are also tested:
Done


http://gerrit.cloudera.org:8080/#/c/13722/5/testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test@109
PS5, Line 109: ====
> also, casting NULL timestamp to string should be tested.
Done


http://gerrit.cloudera.org:8080/#/c/13722/5/testdata/workloads/functional-query/queries/QueryTest/date.test
File testdata/workloads/functional-query/queries/QueryTest/date.test:

http://gerrit.cloudera.org:8080/#/c/13722/5/testdata/workloads/functional-query/queries/QueryTest/date.test@857
PS5, Line 857: select cast("0000-01-01" as date FORMAT "YYYY-MM-DD");
> also, casting NULL date to string should be tested.
Done


http://gerrit.cloudera.org:8080/#/c/13722/5/testdata/workloads/functional-query/queries/QueryTest/date.test@857
PS5, Line 857: select cast("0000-01-01" as date FORMAT "YYYY-MM-DD");
> Please make sure that the following scenarios are also tested:
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 Jul 2019 12:14:58 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 14:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4380/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 14
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Aug 2019 10:41:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3856/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Jul 2019 10:13:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13722/15/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/15/be/src/runtime/datetime-iso-sql-format-parser.cc@144
PS15, Line 144: 0, 99
> Ok, I checked out your change and run some tests. This works:
Good catch this is actually a bug! In fact I think something is off with this TH handling as I observed another input to work incorrectly. Let me give this a second thought and come back with a fix.



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 15
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Sep 2019 08:21:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.h
File be/src/runtime/datetime-iso-sql-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.h@39
PS2, Line 39: ParseDateTime
I could be mistaken but I couldn't find any BE-tests for this function.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.h
File be/src/runtime/datetime-iso-sql-format-tokenizer.h:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.h@51
PS2, Line 51: Tokenize
I could be mistaken but I couldn't find any BE-tests for this and the other functions in this class.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-simple-date-format-parser.h
File be/src/runtime/datetime-simple-date-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-simple-date-format-parser.h@68
PS2, Line 68: /// Constants to hold default format lengths.
Maybe the stuff in this file could be moved to separate classes, like SimpleDateFormatParser and SimpleDateFormatTokenizer ? I think that naming would make the distinction between code dealing with simple-date format and code dealing with iso-sql format cleaner.



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 03 Jul 2019 13:04:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 24:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4964/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 24
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 19 Sep 2019 14:36:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13722

to look at the new patch set (#2).

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................

IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month
- DD: Day
- DDD: Day of year
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute
- SS: Second
- SSSSS: Second of day
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour
- TZM: Timezone minute
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-expr.cc
A be/src/exprs/cast-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/date-parse-util.cc
M be/src/runtime/date-parse-util.h
M be/src/runtime/date-test.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
A be/src/runtime/datetime-iso-sql-format-parser.cc
A be/src/runtime/datetime-iso-sql-format-parser.h
A be/src/runtime/datetime-iso-sql-format-tokenizer.cc
A be/src/runtime/datetime-iso-sql-format-tokenizer.h
D be/src/runtime/datetime-parse-util.h
A be/src/runtime/datetime-parser-common.cc
A be/src/runtime/datetime-parser-common.h
R be/src/runtime/datetime-simple-date-format-parser.cc
A be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/testutil/random-vector-generators.h
M be/src/util/dict-test.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/string-parser.h
M common/thrift/Exprs.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format.test
M testdata/workloads/functional-query/queries/QueryTest/date.test
A tests/query_test/test_cast_with_format.py
53 files changed, 3,096 insertions(+), 697 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13722/2
-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13722

to look at the new patch set (#22).

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................

IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat. Contrary to how these
functions work the FORMAT clause must specify a string literal and
cannot be used with any other kind of a string expression.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month (1-12)
- DD: Day (1-31)
- DDD: Day of year (1-366)
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute (0-59)
- SS: Second (0-59)
- SSSSS: Second of day (0-86399)
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour (-99-+99)
- TZM: Timezone minute (0-99)
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-format-expr.cc
A be/src/exprs/cast-format-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/date-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/date-parse-util.cc
M be/src/runtime/date-parse-util.h
M be/src/runtime/date-test.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
A be/src/runtime/datetime-iso-sql-format-parser.cc
A be/src/runtime/datetime-iso-sql-format-parser.h
A be/src/runtime/datetime-iso-sql-format-tokenizer.cc
A be/src/runtime/datetime-iso-sql-format-tokenizer.h
D be/src/runtime/datetime-parse-util.h
A be/src/runtime/datetime-parser-common.cc
A be/src/runtime/datetime-parser-common.h
R be/src/runtime/datetime-simple-date-format-parser.cc
A be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/testutil/random-vector-generators.h
M be/src/util/dict-test.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/string-parser.h
M common/thrift/Exprs.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
M testdata/workloads/functional-query/queries/QueryTest/date.test
A tests/query_test/test_cast_with_format.py
54 files changed, 3,519 insertions(+), 972 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13722/22
-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 22
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 22:

(2 comments)

Thanks for making the changes!

http://gerrit.cloudera.org:8080/#/c/13722/22/be/src/runtime/date-parse-util.cc
File be/src/runtime/date-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/13722/22/be/src/runtime/date-parse-util.cc@141
PS22, Line 141:   string result;
nit/efficiency: if you call string::append() in a loop, it is useful to call string::reserve() before the loop to avoid resizing the result string too many times:

result.reserve(dt_ctx->fmt_out_len);


http://gerrit.cloudera.org:8080/#/c/13722/22/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/13722/22/be/src/runtime/timestamp-parse-util.cc@223
PS22, Line 223:   string result;
same here



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 22
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Sep 2019 14:25:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 23:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13722/22/be/src/runtime/date-parse-util.cc
File be/src/runtime/date-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/13722/22/be/src/runtime/date-parse-util.cc@141
PS22, Line 141:   string result;
> nit/efficiency: if you call string::append() in a loop, it is useful to cal
This was new to me, but absolutely makes sense. Thanks!


http://gerrit.cloudera.org:8080/#/c/13722/22/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/13722/22/be/src/runtime/timestamp-parse-util.cc@223
PS22, Line 223:   string result;
> same here
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 23
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 19 Sep 2019 08:54:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13722

to look at the new patch set (#9).

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................

IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month
- DD: Day
- DDD: Day of year
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute
- SS: Second
- SSSSS: Second of day
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour
- TZM: Timezone minute
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-expr.cc
A be/src/exprs/cast-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/date-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/date-parse-util.cc
M be/src/runtime/date-parse-util.h
M be/src/runtime/date-test.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
A be/src/runtime/datetime-iso-sql-format-parser.cc
A be/src/runtime/datetime-iso-sql-format-parser.h
A be/src/runtime/datetime-iso-sql-format-tokenizer.cc
A be/src/runtime/datetime-iso-sql-format-tokenizer.h
D be/src/runtime/datetime-parse-util.h
A be/src/runtime/datetime-parser-common.cc
A be/src/runtime/datetime-parser-common.h
R be/src/runtime/datetime-simple-date-format-parser.cc
A be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/testutil/random-vector-generators.h
M be/src/util/dict-test.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/string-parser.h
M common/thrift/Exprs.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
M testdata/workloads/functional-query/queries/QueryTest/date.test
A tests/query_test/test_cast_with_format.py
54 files changed, 3,394 insertions(+), 858 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13722/9
-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3862/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Jul 2019 14:27:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13722

to look at the new patch set (#18).

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................

IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat. Contrary to how these
functions work the FORMAT clause must specify a string literal and
cannot be used with any other kind of a string expression.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month (1-12)
- DD: Day (1-31)
- DDD: Day of year (1-366)
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute (0-59)
- SS: Second (0-59)
- SSSSS: Second of day (0-86399)
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour (-99-+99)
- TZM: Timezone minute (0-99)
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-format-expr.cc
A be/src/exprs/cast-format-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/date-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/date-parse-util.cc
M be/src/runtime/date-parse-util.h
M be/src/runtime/date-test.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
A be/src/runtime/datetime-iso-sql-format-parser.cc
A be/src/runtime/datetime-iso-sql-format-parser.h
A be/src/runtime/datetime-iso-sql-format-tokenizer.cc
A be/src/runtime/datetime-iso-sql-format-tokenizer.h
D be/src/runtime/datetime-parse-util.h
A be/src/runtime/datetime-parser-common.cc
A be/src/runtime/datetime-parser-common.h
R be/src/runtime/datetime-simple-date-format-parser.cc
A be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/testutil/random-vector-generators.h
M be/src/util/dict-test.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/string-parser.h
M common/thrift/Exprs.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
M testdata/workloads/functional-query/queries/QueryTest/date.test
A tests/query_test/test_cast_with_format.py
54 files changed, 3,461 insertions(+), 865 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13722/18
-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 18
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 7:

Thanks again for taking time on this, Attila! I think I have addressed all of your comments and I feel comfortable now of the performance measurements.


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 Jul 2019 12:16:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 24: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 24
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 19 Sep 2019 14:36:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13722/5/testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
File testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test:

http://gerrit.cloudera.org:8080/#/c/13722/5/testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test@109
PS5, Line 109: ====
Please make sure that the following scenarios are also tested:
- casting NULL string values to timestamp
- casting with invalid/NULL format string


http://gerrit.cloudera.org:8080/#/c/13722/5/testdata/workloads/functional-query/queries/QueryTest/date.test
File testdata/workloads/functional-query/queries/QueryTest/date.test:

http://gerrit.cloudera.org:8080/#/c/13722/5/testdata/workloads/functional-query/queries/QueryTest/date.test@857
PS5, Line 857: ====
Please make sure that the following scenarios are also tested:
- casting NULL string values to date
- casting with invalid/NULL format string



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Jul 2019 09:24:17 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 21:

PS21 is a rebase with master


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 21
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 10 Sep 2019 11:54:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3736/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Jun 2019 15:08:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java
File fe/src/main/java/org/apache/impala/analysis/CastExpr.java:

http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@51
PS2, Line 51: private final String castFormat_;
Add a comment to describe the new member.


http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@208
PS2, Line 208: castFormat_
What if 'castFormat_' includes an apostrophe?


http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@226
PS2, Line 226:     boolean isIntermediateStep =
             :         type_.getPrimitiveType() == PrimitiveType.CHAR &&
             :         children_.get(0).getType().getPrimitiveType() == PrimitiveType.STRING;
The naming of this variable is a bit confusing. 'isIntermediateStep' is set to true even if we have a simple CAST(string_col AS char(20)) expression.

I would just put the right-hand side directly into the if condition.


http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@280
PS2, Line 280:     if (null != castFormat_ && !isIntermediateStepNeeded) {
             :       if (!(type_.isDateOrTimeType() && getChild(0).getType().isStringType()) &&
             :           !(type_.isStringType() && getChild(0).getType().isDateOrTimeType())) {
             :         // FORMAT clause works only for casting between date types and string types
             :         throw new AnalysisException("FORMAT clause is not applicable from " +
             :             getChild(0).getType() + " to " + type_);
             :       }
             :       if (castFormat_.isEmpty()) {
             :         throw new AnalysisException("FORMAT clause can't be empty");
             :       }
             :     }
This block can be used as the else branch of he previous 'if' statement. And then you don't need to introduce the 'isIntermediateStepNeeded' variable.


http://gerrit.cloudera.org:8080/#/c/13722/2/testdata/workloads/functional-query/queries/QueryTest/date.test
File testdata/workloads/functional-query/queries/QueryTest/date.test:

http://gerrit.cloudera.org:8080/#/c/13722/2/testdata/workloads/functional-query/queries/QueryTest/date.test@629
PS2, Line 629: #====
             : #---- QUERY
             : #select cast("2014/-; ,11'/:05" as date format "YYYY-MM-DD");
             : #---- RESULTS
             : #2014-11-05
             : #---- TYPES
             : #DATE
This should work right? Any reason it is commented out?


http://gerrit.cloudera.org:8080/#/c/13722/2/testdata/workloads/functional-query/queries/QueryTest/date.test@841
PS2, Line 841:  
Extra space


http://gerrit.cloudera.org:8080/#/c/13722/2/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/13722/2/tests/query_test/test_cast_with_format.py@22
PS2, Line 22: TestCastWithFormat
Could the tests in this script be moved to cast_format.test? Or to another '.test' file?



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 03 Jul 2019 12:50:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 14:

did a git rebase


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 14
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Aug 2019 10:03:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13722

to look at the new patch set (#11).

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................

IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat. Contrary to how these
functions work the FORMAT clause must specify a string literal and
cannot be used with any other kind of a string expression.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month
- DD: Day
- DDD: Day of year
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute
- SS: Second
- SSSSS: Second of day
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour
- TZM: Timezone minute
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-expr.cc
A be/src/exprs/cast-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/date-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/date-parse-util.cc
M be/src/runtime/date-parse-util.h
M be/src/runtime/date-test.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
A be/src/runtime/datetime-iso-sql-format-parser.cc
A be/src/runtime/datetime-iso-sql-format-parser.h
A be/src/runtime/datetime-iso-sql-format-tokenizer.cc
A be/src/runtime/datetime-iso-sql-format-tokenizer.h
D be/src/runtime/datetime-parse-util.h
A be/src/runtime/datetime-parser-common.cc
A be/src/runtime/datetime-parser-common.h
R be/src/runtime/datetime-simple-date-format-parser.cc
A be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/testutil/random-vector-generators.h
M be/src/util/dict-test.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/string-parser.h
M common/thrift/Exprs.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
M testdata/workloads/functional-query/queries/QueryTest/date.test
A tests/query_test/test_cast_with_format.py
54 files changed, 3,434 insertions(+), 861 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13722/11
-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 11
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 19: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/4884/


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 19
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Sep 2019 21:39:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13722

to look at the new patch set (#16).

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................

IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat. Contrary to how these
functions work the FORMAT clause must specify a string literal and
cannot be used with any other kind of a string expression.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month (1-12)
- DD: Day (1-31)
- DDD: Day of year (1-366)
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute (0-59)
- SS: Second (0-59)
- SSSSS: Second of day (0-86399)
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour (-99-+99)
- TZM: Timezone minute (0-99)
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-format-expr.cc
A be/src/exprs/cast-format-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/date-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/date-parse-util.cc
M be/src/runtime/date-parse-util.h
M be/src/runtime/date-test.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
A be/src/runtime/datetime-iso-sql-format-parser.cc
A be/src/runtime/datetime-iso-sql-format-parser.h
A be/src/runtime/datetime-iso-sql-format-tokenizer.cc
A be/src/runtime/datetime-iso-sql-format-tokenizer.h
D be/src/runtime/datetime-parse-util.h
A be/src/runtime/datetime-parser-common.cc
A be/src/runtime/datetime-parser-common.h
R be/src/runtime/datetime-simple-date-format-parser.cc
A be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/testutil/random-vector-generators.h
M be/src/util/dict-test.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/string-parser.h
M common/thrift/Exprs.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
M testdata/workloads/functional-query/queries/QueryTest/date.test
A tests/query_test/test_cast_with_format.py
54 files changed, 3,461 insertions(+), 865 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13722/16
-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 16
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 19:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4884/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 19
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Sep 2019 17:32:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13722

to look at the new patch set (#20).

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................

IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat. Contrary to how these
functions work the FORMAT clause must specify a string literal and
cannot be used with any other kind of a string expression.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month (1-12)
- DD: Day (1-31)
- DDD: Day of year (1-366)
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute (0-59)
- SS: Second (0-59)
- SSSSS: Second of day (0-86399)
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour (-99-+99)
- TZM: Timezone minute (0-99)
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-format-expr.cc
A be/src/exprs/cast-format-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/date-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/date-parse-util.cc
M be/src/runtime/date-parse-util.h
M be/src/runtime/date-test.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
A be/src/runtime/datetime-iso-sql-format-parser.cc
A be/src/runtime/datetime-iso-sql-format-parser.h
A be/src/runtime/datetime-iso-sql-format-tokenizer.cc
A be/src/runtime/datetime-iso-sql-format-tokenizer.h
D be/src/runtime/datetime-parse-util.h
A be/src/runtime/datetime-parser-common.cc
A be/src/runtime/datetime-parser-common.h
R be/src/runtime/datetime-simple-date-format-parser.cc
A be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/testutil/random-vector-generators.h
M be/src/util/dict-test.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/string-parser.h
M common/thrift/Exprs.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
M testdata/workloads/functional-query/queries/QueryTest/date.test
A tests/query_test/test_cast_with_format.py
54 files changed, 3,530 insertions(+), 964 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13722/20
-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 20
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 3:

(94 comments)

Thanks for taking a look Attila! I managed to cover your comments except the ones referring to an existing implementation of "day of year" calculation as I don't have it locally. I'll do a rebase and tackle those as well.

Note, I did some performance measurements an apparently the 'new' pattern handling is slower than the 'old' handling. This is also reflected in running the benchmarks and also executing some casts through impala-shell. I'm investigating where the difference is.

http://gerrit.cloudera.org:8080/#/c/13722/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13722/2//COMMIT_MSG@49
PS2, Line 49: In a string type to timestamp conversion the timezone offset tokens
            :   are parsed, expected to match with the input but they don't adjust
            :   the result as the input is already expected to be in UTC format.
> Is this behavior consistent with how other SQL systems work?
Not really since e.g. Oracle have different types for timestamp with timezone and timestamp without timezone. Impala tries to use a single type for both purposes. Consulted with Zoltan Ivanfi and according to him the best solution here is to simply omit the timezone part of the income.
Note, CAST() without FORMAT would also omit the timezone part if such an input is provided.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/benchmarks/parse-timestamp-benchmark.cc
File be/src/benchmarks/parse-timestamp-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/benchmarks/parse-timestamp-benchmark.cc@47
PS2, Line 47: //                                                                         (relative) (relative) (relative)
            : //---------------------------------------------------------------------------------------------------------
            : //                    BoostStringDate              0.667    0.691    0.731         1X         1X         1X
            : //                          BoostDate              0.642    0.679    0.704     0.962X     0.982X     0.963X
            : //                             Impala               6.67     6.92     7.25        10X        10X      9.93X
            : //
            : //ParseTimestamp:            Function  iters/ms   10%ile   50%ile   90%ile     10%ile     50%ile     90%ile
            : //                                                                         (relative) (relative) (relative)
            : //---------------------------------------------------------------------------------------------------------
            : //                          BoostTime               0.48      0.5     0.52         1X         1X         1X
            : //                             Impala               5.73        6     6.21      11.9X        12X      11.9X
            : //
            : //ParseTimestampWithFormat:  Function  iters/ms   10%ile   50%ile   90%ile     10%ile     50%ile     90%ile
            : //                                                                         (relative) (relative) (relative)
            : //---------------------------------------------------------------------------------------------------------
            : //                      BoostDateTime              0.241    0.255     0.26         1X         1X         1X
            : //    ImpalaSimpleDateFormatTimeStamp                 16     16.5     17.1      66.6X      64.7X      65.6X
            : //  ImpalaSimpleDateFormatTZTimeStamp               16.2     16.6     17.
> Maybe it wold make sense to add the new parsing functions to these benchmar
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/common/init.cc
File be/src/common/init.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/common/init.cc@312
PS2, Line 312: DateTimeSimp
> I think this should be renamed to InitSimpleDateParseCtx() to make it clear
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-expr.h
File be/src/exprs/cast-expr.h:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-expr.h@29
PS2, Line 29: CastForm
> If I understand this correctly, this class is used only for the new cast op
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc
File be/src/exprs/cast-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc@172
PS2, Line 172: const DateTimeFormatCo
> const DateTimeFormatContext*
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc@180
PS2, Line 180: int ret_val = tv.Format(*format_ctx, 
> Check the return value, like you do in L199-200.
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc@191
PS2, Line 191: if (UNLIKELY(!dv.IsVa
> This should be const too.
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc@308
PS2, Line 308:  Times
> Rename to 'format_ctx' for consistency.
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc@308
PS2, Line 308: if (val.is_null) retu
> This should be const too.
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc@312
PS2, Line 312: 
> const char*
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc@315
PS2, Line 315: 
> const char*
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc@342
PS2, Line 342: teVal CastFunctions::C
> Should be const.
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc@342
PS2, Line 342: stToDa
> Rename to 'format_ctx'
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc@346
PS2, Line 346: 
> const char*
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/cast-functions-ir.cc@348
PS2, Line 348: erpre
> const char*
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/expr-test.cc@76
PS2, Line 76: DECLARE_bool(abort_on_config_error);
> Why the extra new line?
Some leftover. Removed.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/expr-test.cc@3214
PS2, Line 3214: mpVa
> Any reason this was changed to 2002?
Leftover change from debugging. Reverted.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/exprs/expr-test.cc@3322
PS2, Line 3322: 20
> Any reason for these timestamp changes in L3322-3337?
The following tests had the same expected output and when any f them failed it was hard to spot exactly which one was the problematic. So I changed them a bit so that I could find the failing test searching for the expected value in the output.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/date-parse-util.h
File be/src/runtime/date-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/date-parse-util.h@36
PS2, Line 36: Parse
> Probably this should be renamed to ParseSimpleDateFormat(), right?
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/date-parse-util.h@47
PS2, Line 47: Parse
> Same here.
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/date-parse-util.h@84
PS2, Line 84: output parameter
> "set output parameter to an invalid DateValue"
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/date-parse-util.h@86
PS2, Line 86: tatic bool IndicateDateParseFailure(DateValue* date);
> I think this would be better placed in the .cc file only. 
I prefer not having functions hanging around without belonging to a class and I'm not a big fan of macros. As this function is used only in this class I'd leave it here as a private function.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/date-parse-util.cc
File be/src/runtime/date-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/date-parse-util.cc@40
PS2, Line 40: Parse
> rename to ParseSimpleDateFormat()?
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/date-parse-util.cc@90
PS2, Line 90: te) {
> Rename to ParseSimpleDateFormat()?
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/date-parse-util.cc@190
PS2, Line 190: bool DateParser::IndicateDateParseFailure(DateValue* date) {
             :   *date = DateValue();
             :   return false;
             : }
> A macro instead would be easier to use
See my comment in the header. (Thanks for the suggestion, though)


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/date-value.h
File be/src/runtime/date-value.h:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/date-value.h@126
PS2, Line 126: Parse
> Rename this and Parse() functions above to ParseSimpleDateFormat()
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.h
File be/src/runtime/datetime-iso-sql-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.h@24
PS2, Line 24: namespace impala 
> Probably not unnecessary.
Dropped.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.h@39
PS2, Line 39: SED_RESULT;
> I could be mistaken but I couldn't find any BE-tests for this function.
I covered this with E2E tests in https://gerrit.cloudera.org/#/c/13722/2/testdata/workloads/functional-query/queries/QueryTest/date.test
I didn't feel the need to test this function directly in BE tests to cover the same use cases.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@54
PS2, Line 54: 
            :       while (i < dt_ctx.toks.size() && dt_ctx
> current_pos should be validated first before calling IsSeparator() on it.
Nice improvement indeed. Done.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@44
PS2, Line 44:     const DateTimeFormatToken* tok = &dt_ctx.toks[i];
            :     if (tok->type == SEPARATOR) {
            :       if (!DateTimeIsoSqlFormatTokenizer::IsSeparator(*current_pos)) return false;
            :       // Advance to the end of the separator sequence.
            :       ++current_pos;
            :       while (current_pos - input_str < input_len
            :           && DateTimeIsoSqlFormatTokenizer::IsSeparator(*current_pos)) {
            :         ++current_pos;
            :       }
            :       // Advance to the end of the separator sequence in the expected tokens list.
            :       ++i;
            :       while (i < dt_ctx.toks.size() && dt_ctx.toks[i].type == SEPARATOR) ++i;
            : 
            :       // If we reached the end of input or the end of token sequence, we can return.
            :       if (current_pos >= end_pos || i >= dt_ctx.toks.size()) {
            :         return (current_pos >= end_pos && i >= dt_ctx.toks.size());
            :       }
            : 
            :       // Next token, following the separator sequence.
            :       tok = &dt_ctx.toks[i];
            : 
            :       // The last '-' of a separator sequence might be taken as a sign for timezone hour.
            :       if (*(current_pos - 1) == '-' && tok->type == TIMEZONE_HOUR) {
            :         --current_pos;
            :       }
            :     }
            : 
            :     const char* group_end_pos = GetNextTokenGroupFromInput(current_pos,
            :         (int)(end_pos - current_pos), *tok);
            :     i
> I think this could be refactored a bit:
Looks way cleaner than the mess I had. Thanks!


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@76
PS2, Line 76: switch(tok->type) {
            :       case YEAR: {
            :         if (!ParseAndValidate(current_pos, group_le
> I think it would be better if GetNextTokenGroupFromInput() returned a bool 
I chose to return end_pos (or nullptr). Done.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@87
PS2, Line 87: }
> I'm not sure if this is correct.
Good catch. This parses only timestamps properly. The timestamp creation fails anyway later on for years smaller than 1400 so I can safely remove this check. Wrote some boundary tests for year to cover this.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@96
PS2, Line 96: break;
> Same as L87
Same as above. Done.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@111
PS2, Line 111: 
> day_count < 1 ?
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@141
PS2, Line 141:         if (!ParseFractionTokenGroup(current_pos, group_len, result)) return false;
             :         break;
             :       }
             :       case MERIDIEM_INDICATOR: {
             :         string indicator(current_pos, group_len);
             :         boost::to_upper(indicator);
             :         if (indicator == "PM" || indicator == "P.M.") result->hour += 12;
             :         break;
             :       }
> Block can be moved to a separate functon, here and elsewhere in the switch 
The idea was to only move logic to functions that are used both by SimpleDateFormat and IsoSqlFormat parsing.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@184
PS2, Line 184: n_year -= month_leng
> (result->year >= 0) ?
Indeed, thx. Done.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@185
PS2, Line 185: 
> Introducing a new variable 'month' is not really necessary.
renamed 'i' to 'month_id' for more readibility and dropped 'month'


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@206
PS2, Line 206: 
             :   const char* end_pos = input_str;
> Return bool to signal success/failure. Or return end_pos instead of passing
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@208
PS2, Line 208: int len = 0;
> We could just set *end_pos in the GetNextTokenGroupFromInput() function, in
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@219
PS2, Line 219:   const char* input_str, int input_len) {
> I think, this should be:
No need to check if 'input_len' > 2 as the length of a  TIMEZONE_HOUR is by default 3. This reduces it to 2 in case it doesn't start with a sign.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@225
PS2, Line 225: if (DateTimeIsoSqlFormatTokenizer::GetTokenType(token_str, &token
> Isn't it a parsing error if (len < tok.len) but *end_pos is a separator?
It's not an error. E.g. we have to parse inputs like this:
select cast("2019-1-1" as timestamp format "YYYY-MM-DD")
Here for the month and day token groups 'len'=1 while 'tok.len'=2 but still valid.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@229
PS2, Line 229:   }
             :   return nullptr;
> Same as L206. Consider returning a bool or end_pos.
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@231
PS2, Line 231: 
> ParseMeridiemIndicatorFromInput() function should set end_pos instead of ex
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@241
PS2, Line 241: }
             : 
             : void DateTimeIsoSqlFormatParser::GetRoundYear(const TimestampValue* now,
             :     DateTimeParseResult* result) {
             :   DCHECK(now != nullptr);
             :   DCHECK(result != nullptr);
             :   DCHECK(result->year >= 0 && result->year < 100);
             :   i
> Duplicate code.
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@253
PS2, Line 253: 
> actual_token_len > 0 && actual_token-len < 4
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@261
PS2, Line 261: 
> Isn;t this called realigned year?
Oracle and Postgre refers to this pattern as round year. Realigned year might be some internal naming in Impala for this pattern in SimpleDateFormat.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.h
File be/src/runtime/datetime-iso-sql-format-tokenizer.h:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.h@42
PS2, Line 42: boo
> bool?
No idea why I made it int :) Done.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.h@45
PS2, Line 45:     accept_time_toks_ = time_toks;
> Shouldn't 'used_tokens_' be cleared when tokenizer is reset?
It's cleared at the end of Tokenize() after TokenizeImpl() was finished.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.h@51
PS2, Line 51: Tokenize
> I could be mistaken but I couldn't find any BE-tests for this and the other
I didn't write BE tests for this specific function as this is basically tested with each E2E test where cast has a format clause. Didn't see the point to cover the same use cases with a different level of tests.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@87
PS2, Line 87: used_tokens_.clear();
> Any reason 'used_tokens_' is cleared here and not in Reset()?
Nothing serious: Doing it here the user can run Tokenize() multiple times on the same input without calling Reset().Not sure if it's a valid scenario, though.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@93
PS2, Line 93: current_pos
> In some functions this is called 'current' while in others it is called 'cu
Used current_pos. Done.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@94
PS2, Line 94: current_pos < str_end)
> current_pos < str_end
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@96
PS2, Line 96: break;
> This should be break, otherwise the check in L101 won't be executed.
Good finding. Fixed and also wrote a test to cover this.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@104
PS2, Line 104: void DateTimeIsoSqlFormatTokenizer::ProcessSeparators(const char** current_pos) {
> DCHECK(*current != nullptr);
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@107
PS2, Line 107: har* str_end = dt_ctx_->fmt + dt_ctx_->fmt_l
> I think the condition should be:
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@114
PS2, Line 114: 
> Mention in a comment that this function is doing greedy-matching and finds 
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@115
PS2, Line 115: FormatTokenizationResult DateTimeIsoSqlFormatTokenizer::ProcessNextToken(
> DCHECK(*current_pos != nullptr);
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@120
PS2, Line 120: unsigned curr_token_size = min((long)MAX_TOKEN_SIZE, str_end - *current_pos);
             :   DCHECK(curr_token_size > 0);
> This should be done before the loop starts. No need to convert every prefix
Done, though won't have much impact on query performance as tokenization runs once per query and a format string won't have more than 10-15 tokens.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@122
PS2, Line 122: ring
> const auto
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@140
PS2, Line 140: retu
> const auto
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@204
PS2, Line 204:     return SECOND_IN_DAY_CONFLICT;
> I think, you should also check that no more than one fractional tokens were
I've put these check here in CheckIncompatibilities() and also added test to cover this.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@218
PS2, Line 218: (const string
> We can pass the char directly. No need for a pointer.
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.h
File be/src/runtime/datetime-parser-common.h:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.h@128
PS2, Line 128: length of the input format string. However, there ar
> nit: produces output that is longer than the format string.
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.h@129
PS2, Line 129: 
> nit: from
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.h@157
PS2, Line 157: 
> nit: 1 space
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.cc
File be/src/runtime/datetime-parser-common.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.cc@46
PS2, Line 46:   DCHECK(!century_break_ptime.is_special());
> Add DCHECK(!century_break_time.is_special())
Done, however, this is basically copy-pasting this code to a common place.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.cc@61
PS2, Line 61:     const StringVal& format, bool is_error) {
> DCHECK(context != nullptr);
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.cc@118
PS2, Line 118: lse {
> Maybe "Parse" instead of "Construct" would be better in function names here
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.cc@120
PS2, Line 120:   }
> Here and in the functions below add:
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.cc@120
PS2, Line 120:   }
             : }
             : 
             : bool ParseAndValidate(const char* token_group, int token_len, int min, int max,
             :     int* resul
> All the Construct.. functions are very similar, only the valid range differ
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-parser-common.cc@181
PS2, Line 181: 
> Can you use std::pow() to simplify this?
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-simple-date-format-parser.h
File be/src/runtime/datetime-simple-date-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-simple-date-format-parser.h@90
PS2, Line 90:  Parse the d
> rename to InitSimpleDateParseCtx().
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-simple-date-format-parser.cc
File be/src/runtime/datetime-simple-date-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-simple-date-format-parser.cc@372
PS2, Line 372: DateTimeSimpleDateForma
> Rename to GetDefaultSimpleDateFormatContext() ?
For ISO SQL parsing there is nothing like default context as there is always a format given by the user. So calling this GetDefaultFormatContext() wouldn't lead to confusion between the two parsing.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-simple-date-format-parser.cc@488
PS2, Line 488:      }
             :       case MONTH_IN_YEAR_SLT: {
             :         char raw_buff[tok.len];
             :         std::transform(tok_val, tok_val + tok.len, raw_buff, ::tolower);
             :         StringValue buff(raw_buff, tok.len);
             :         boost::unordered_map<StringValue, int>::const_iterator iter =
             :             REV_MONTH_INDEX.find(buff);
             :         if (UNLIKELY(iter == REV_MONTH_INDEX.end())) return false;
             :         dt_res
> This should go to a separate function as well.
I could reply the same here as I had for the same in the ISO version: I wanted to move only the commonly used functionality to separate function. I prefer keeping the ones that are used only by one parsing (ISO SQL or SimpleDateFormat) untouched.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/timestamp-parse-util.h@67
PS2, Line 67: Otherwise it sets
> nit: Otherwise
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/timestamp-parse-util.cc@43
PS2, Line 43: ou
> nit:double space.
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/timestamp-parse-util.cc@43
PS2, Line 43:  t
> nit: Double space.
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/timestamp-parse-util.cc@130
PS2, Line 130: 
> No need for a fully qualified name, 'time_duration' is available in this na
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/timestamp-parse-util.cc@150
PS2, Line 150:   day_offset = AdjustWithTimezone(t, dt_result.tz_offset);
> Before this change, timezone-adjustment was done only when dt_ctx.has_time_
If has_time_toks=false then there won't be any time tokens to be adjusted with the TZ offset. But it's true that we can ignore this function call in this case so I'll move it back into the if.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/timestamp-parse-util.cc@231
PS2, Line 231: val = d.year
> tok.len < 4
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/timestamp-parse-util.cc@232
PS2, Line 232:        if (tok.len < 4) {
             :           int adjust_factor = std::
> Isn't this a breaking change. "% 100" vs "% adjust_factor" ?
I would call it a bugfix as previously round year didn't work well for all input year lengths.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/service/impala-server.cc@1119
PS2, Line 1119: // For testing purposes
> Is this query option used for testing only? Please add a comment to explain
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/common/thrift/Exprs.thrift
File common/thrift/Exprs.thrift:

http://gerrit.cloudera.org:8080/#/c/13722/2/common/thrift/Exprs.thrift@144
PS2, Line 144: TCastExpr
> Consider renaming to TCastFormatExpr or something similar to emphasize that
I would leave it as it is to be consistent with the other elements of TExprNode.


http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/main/cup/sql-parser.cup@2937
PS2, Line 2937: cast_format_val 
> Consider renaming this to 'cast_format_val'
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java
File fe/src/main/java/org/apache/impala/analysis/CastExpr.java:

http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@51
PS2, Line 51: // Stores the value of the FORMAT
> Add a comment to describe the new member.
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@208
PS2, Line 208: tFormat_.is
> What if 'castFormat_' includes an apostrophe?
Will put the value within quotation marks as that is not allowed within a format.


http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@226
PS2, Line 226:     msg.node_type = TExprNodeType.FUNCTION_CALL;
             :     // Sets cast_expr in case FORMAT clause was provided and this is not an intermediate
             :     // step between a char and a datetime.
> The naming of this variable is a bit confusing. 'isIntermediateStep' is set
It's a bit difficult to understand the intention when I simply add the condition to the if. Did so and wrote a short comment to clarify.


http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@280
PS2, Line 280: 
             :     if (null != castFormat_ && !isIntermediateStepNeeded) {
             :       if (!(type_.isDateOrTimeType() && getChild(0).getType().isStringType()) &&
             :           !(type_.isStringType() && getChild(0).getType().isDateOrTimeType())) {
             :         // FORMAT clause works only for casting between date types and string types
             :         throw new AnalysisException("FORMAT clause is not applicable from " +
             :             getChild(0).getType() + " to " + type_);
             :       }
             :       if (castFormat_.isEmpty()) {
             :         throw new AnalysisException("FORMAT clause can't be empty");
             :      
> This block can be used as the else branch of he previous 'if' statement. An
The purpose of that variable is to enhance code readability.

It won't be simple an else branch as I also have to check if castFormat_ is set. I found it more readable this way.


http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java:

http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java@3088
PS2, Line 3088:  public void TestCastFormatClauseFromDatetime() throws AnalysisException {
              :     RunCastFormatTestOnType("TIMESTAMP");
              :     RunCastFormatTestOnType("DATE");
              :   }
              : 
              :   private void RunCastFormatTestOnType(String type) {
              :     String to_timestamp_cast = "cast('05-01-2017' as " + type + ")";
              :     AnalysisError(
              :         "select cast(" + to_timestamp_cast + " as DATETIME FORMAT 'MM-dd-yyyy')",
              :         "Unsupported data type: DATETIME");
              :     if (!type.equals("TIMESTAMP")) {
              :       AnalysisError(
              :           "select cast(" + to_timestamp_cast + " as TIMESTAMP FORMAT 'MM-dd-yyyy')",
              :           "FORMAT clause is not applicable from " + type + " to TIMESTAMP");
              :     }
              :     if (!type.equals("DATE")) {
              :       AnalysisError("select cast(" + to_timestamp_cast + " as DATE FORMAT 'MM-dd-yyyy')",
              :    
> Would it make sense to add similar tests for DATE instead of TIMESTAMP?
It absolutely would. Done.


http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java@3109
PS2, Line 3109:     AnalysisError("select cast(" + to_timestamp_cast + " AS BOOLEAN FORMAT 'MM-dd-yyyy')",
              :         "FORMAT clause is not applicable from " + type + " to BOOLEAN");
              :     AnalysisError("select cast(" + to_timestamp_cast + " AS DOUBLE FORMAT 'MM-
> Please add similar test for DATE instead of TIMESTAMP.
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@1458
PS2, Line 1458:     ParsesOk("select cast('05-01-2017' as timestamp format 'MM-dd-yyyy')");
              :     ParsesOk("select cast('2017.01.02' as date format 'YYYY-MM-DD')");
              :     ParserError("select cast(a + 5.0 as badtype) from t");
              :     ParserError("select cast(a + 5.0, string) from t");
              :     ParserError("select cast('05-01-2017' as timestamp format 1234
> Please add similar tests for DATE.
from parsing perspective it doesn't really make a difference to do the same for dates as well. I re-wrote one of these to use date instead of TS.


http://gerrit.cloudera.org:8080/#/c/13722/2/testdata/workloads/functional-query/queries/QueryTest/date.test
File testdata/workloads/functional-query/queries/QueryTest/date.test:

http://gerrit.cloudera.org:8080/#/c/13722/2/testdata/workloads/functional-query/queries/QueryTest/date.test@629
PS2, Line 629: ====
             : ---- QUERY
             : select cast("2014/- ,11'/:-05" as date format "YYYY-MM-DD");
             : ---- RESULTS
             : 2014-11-05
             : ---- TYPES
             : DATE
> This should work right? Any reason it is commented out?
Yes, removed the comments. Note, the testing framework for some reason fails on ';' char so I removed it from the input.


http://gerrit.cloudera.org:8080/#/c/13722/2/testdata/workloads/functional-query/queries/QueryTest/date.test@841
PS2, Line 841: 
> Extra space
Done


http://gerrit.cloudera.org:8080/#/c/13722/2/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/13722/2/tests/query_test/test_cast_with_format.py@22
PS2, Line 22: TestCastWithFormat
> Could the tests in this script be moved to cast_format.test? Or to another 
There was one reason I couldn't do that: The tests in cast_format.test are run for both beeswax and HS2 and apparently those differ in how many digits they pad the fractional second part with.

select cast("2019-10-10 12:13:14.12345" as timestamp)
Beeswax: 2019-10-10 12:13:14.123450000
HS2:     2019-10-10 12:13:14.123450

As most of the tests contain fractional seconds I couldn't simply move them to that file. I could remove either Beeswax or HS2 from the test matrix but in that case we would lose some valuable coverage.

On a sidenote: It's a good question why these interfaces pad fractional seconds differently and if it's only related to the tests or to the interfaces themselves. I haven't investigated this part further but this characteristic is also present without this change.



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Jul 2019 09:57:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 17:

(7 comments)

Couple more comments. I can +2 it once these are done.

http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/exprs/cast-format-expr.h
File be/src/exprs/cast-format-expr.h:

http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/exprs/cast-format-expr.h@35
PS17, Line 35: virtual
nit: 'virtual' keyword is not necessary.

'override' keyword already indicates that this is an
override of a virtual function in the bases class.


http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/exprs/cast-format-expr.h@37
PS17, Line 37: virtual
nit: same


http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/exprs/cast-format-expr.h@46
PS17, Line 46: FunctionContext
const


http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/exprs/cast-format-expr.cc
File be/src/exprs/cast-format-expr.cc:

http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/exprs/cast-format-expr.cc@64
PS17, Line 64: FunctionContext
const FunctionContext*


http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/exprs/cast-functions-ir.cc
File be/src/exprs/cast-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/exprs/cast-functions-ir.cc@181
PS17, Line 181:     int buf_len = format_ctx->fmt_out_len + 1;
DCHECK(buf_len <= IsoSqlFormatTokenizer::MAX_FORMAT_LENGTH);


http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/exprs/cast-functions-ir.cc@203
PS17, Line 203:     int buf_len = format_ctx->fmt_out_len + 1;
DCHECK(buf_len <= IsoSqlFormatTokenizer::MAX_FORMAT_LENGTH);


http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@110
PS17, Line 110:  DCHECK(*current_pos <= str_end);
Please revisit this DCHECK and the one in L113.

If you use '<=' here, you should use '>=' in L113.



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 17
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Sep 2019 08:37:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 15:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/13722/13//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13722/13//COMMIT_MSG@24
PS13, Line 24: List of tokens implemented by this change:
             : - YYYY, YYY, YY, Y: Year tokens
             : - RRRR, RR: Round year tokens
             : - MM: Month (1-12)
             : - DD: Day (1-31)
             : - DDD: Day of year (1-366)
             : - HH, HH12: Hour of day (1-12)
             : - HH24: Hour of day (0-23)
             : - MI: Minute (0-59)
             : - SS: Second (0-59)
             : - SSSSS: Second of day (0-86399)
             : - FF, FF1, ..., FF9: Fractional second
             : - AM, PM, A.M., P.M.: Meridiem indicators
             : - TZH: Timezone hour (-99-+99)
             : - TZM: Timezone minute (0-99)
             : - Separators: - . / , ' ; : space
             : - ISO8601 date indicators (T, Z)
> Please specify the allowed range of values for every token.
Done except for year and round year tokens as the boundaries depend on the type we cast to/from.
Also omitted to add range for fractional seconds as they depend of which precision we talk about.


http://gerrit.cloudera.org:8080/#/c/13722/13/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/13/be/src/runtime/datetime-iso-sql-format-parser.cc@142
PS13, Line 142:  // Deliberately ignore the timezone offsets.
              :         int du
> Does this mean that we accept any token as a timezone hour/min? Shouldn't w
Indeed. Added checks for this.


http://gerrit.cloudera.org:8080/#/c/13722/13/be/src/runtime/datetime-iso-sql-format-parser.cc@196
PS13, Line 196:   if (*current_pos >= end_pos || *current_tok_idx >= dt_ctx.toks.size()) {
              :     return (*current_pos >= end_pos && *current_tok_idx >= dt_ctx.toks.size());
              :   }
              : 
> How about TIMEZONE_MIN? Is it possible to specify a negative TIMEZONE_MIN w
In case you want to specify negative TZM then you have to do provide zero TZH like "-00:30". This is explicitly mentioned in the design doc.


http://gerrit.cloudera.org:8080/#/c/13722/13/be/src/runtime/datetime-iso-sql-format-parser.cc@219
PS13, Line 219:    return ParseMeridiemIndicatorFromInput(input_str, input_len);
              :   }
              : 
              :   int max_tok_len = min(input_len, tok.len);
              :   c
> Should we do the same for TIMEZONE_MIN as well?
See above: TZM can only be positive.



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 15
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 28 Aug 2019 13:06:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13722

to look at the new patch set (#10).

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................

IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat. Contrary to how these
functions work the FORMAT clause is a string literal provided by the
user and its value can't come from a column.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month
- DD: Day
- DDD: Day of year
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute
- SS: Second
- SSSSS: Second of day
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour
- TZM: Timezone minute
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-expr.cc
A be/src/exprs/cast-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/date-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/date-parse-util.cc
M be/src/runtime/date-parse-util.h
M be/src/runtime/date-test.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
A be/src/runtime/datetime-iso-sql-format-parser.cc
A be/src/runtime/datetime-iso-sql-format-parser.h
A be/src/runtime/datetime-iso-sql-format-tokenizer.cc
A be/src/runtime/datetime-iso-sql-format-tokenizer.h
D be/src/runtime/datetime-parse-util.h
A be/src/runtime/datetime-parser-common.cc
A be/src/runtime/datetime-parser-common.h
R be/src/runtime/datetime-simple-date-format-parser.cc
A be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/testutil/random-vector-generators.h
M be/src/util/dict-test.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/string-parser.h
M common/thrift/Exprs.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
M testdata/workloads/functional-query/queries/QueryTest/date.test
A tests/query_test/test_cast_with_format.py
54 files changed, 3,434 insertions(+), 861 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13722/10
-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3735/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Jun 2019 15:01:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 6:

(24 comments)

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/benchmarks/parse-timestamp-benchmark.cc
File be/src/benchmarks/parse-timestamp-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/benchmarks/parse-timestamp-benchmark.cc@46
PS3, Line 46: //
> nit: put a space after // in this line and below.
Done


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/benchmarks/parse-timestamp-benchmark.cc@64
PS3, Line 64: //   ImpalaSimpleDateFormatTZTimeStamp               16.1     16.7     17.1      67.2X      66.7X      65.8X
            : //         ImpalaIsoSqlFormatTimeStamp
> Maybe add 'ImpalaIsoSqlFormatTZTimestamp' to this micro-benchmark?
Done


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/benchmarks/parse-timestamp-benchmark.cc@134
PS3, Line 134: 
> I think this should be called 'TestImpalaSimpleDateFormat' for consistency 
Done


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/exprs/cast-functions-ir.cc
File be/src/exprs/cast-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/exprs/cast-functions-ir.cc@172
PS3, Line 172: 
> Add 'const' specifier to the type. Here and below in L192, L309, L344.
Done


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/date-test.cc
File be/src/runtime/date-test.cc:

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/date-test.cc@36
PS3, Line 36: using namespace datetime_parse_util;
> I think, this test should cover the new iso-sql date parser as well.
Referring to the conversation we had under another review comment, I have covered iso sql pattern matching functionality with E2E tests. I could write the same here as well but considering the effort I've put into writing them I'll leave them as they are.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@219
PS2, Line 219: 
> What happens if the input string is invalid?
hmm, indeed. The intention was here to reduce the token length from 3 to 2 if there is no sign for the TIMEZONE_HOUR.

Tried this, should have failed but for some reason still succeeded:
cast('2010-10-10 04:20:10.123456789 3' as timestamp format "yyyy-mm-dd hh24:mi:ss.ff9 tzh")
But I assume this is some undefined behavior and I got lucky.

Fixed but couldn't cover with additional tests.


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@43
PS3, Line 43: i < dt_ctx.toks.size()
> (current_pos < end_pos) should be checked here too. 
I guess this works well but accidentally. When we end the input string but have an extra separator in the format then this check in L46 is run on the string termination char that is not a separator and returns false.

Let me add a check to the beginning of the for loop to have this under control.


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@49
PS3, Line 49: nt_pos;
> current_pos < end_pos
Somehow I missed this. Thx!
Done


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@72
PS3, Line 72: t char* group_end_pos = GetN
> The result of end_pos - current_pos is already int, isn't it?  No need to e
Done


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@103
PS3, Line 103: se DAY_IN_MON
> Why not use 'day_in_year' instead directly? No need to introduce a new inte
Done


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@116
PS3, Line 116: if (hour == 12) hour 
> I think, you should clear 'result' at the beginning of the ParseDateTime() 
My assumption was that it is initialized to zero when the 'result' is created at the callsite and only HOUR_IN_HALF_DAY and MERIDIEM_INDICATOR cases modify it. In case someone ruins this invariant then tests would start to fail so this won't rot unnoticed.


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@168
PS3, Line 168: }
> current_pos < end_pos || i < dt_ctx.toks.size()
The second part of the condition is always true as we are after the for loop that leaves i = dt_ctx.toks.size(). No need to extend the current condition.


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@176
PS3, Line 176: e format string is over but there are tokens left in the input.
> Please double check that this will work and not throw an exception for year
Works fine. Added test coverage.


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@186
PS3, Line 186: 
> result->month >= 1 && result->month <= 12
This code was changed to use GetMonthAndDayFromDayOfYear(). That functions covers these checks.


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@187
PS3, Line 187: 
> result->day >= 1 && result->day <= 31
Same as above.


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@196
PS3, Line 196:   // Handle separately the meridiem indicators for two reasons.
> Some additional checks can't hurt:
This already returns nullptr if input_len == 0. We won't enter the while loop, leave end_pos = inout_str and return nullptr right after the loop.
A DCHECK won't hurt, though.


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@205
PS3, Line 205:   const char* start_of_token = input_str;
             :   if (tok.type == TIMEZONE_HOUR) {
             :     if (max_tok_len > 2) max_tok_len = 2;
             :     if (*start_of_token == '-' || *start_of_token == '+') ++start_of_token;
             :   }
             : 
             :   int len = 1;
             :   const char* end_pos = start_of_token;
             :   while (len <= max_tok_len && !DateTimeIsoSqlFormatTokenizer::IsSeparator(*end_pos)) {
             :     ++len;
             :     ++end_pos;
> I find this code a bit confusing. I think the loop has an indexing problem 
Re-wrote this part. Not exactly as how you suggested, though. What do you think?


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@222
PS3, Line 222:     const char* input_str, int input_len) {
             :   DCHECK(input_str != nullptr);
             :   for (int expected_tok_len : {2, 4}) {
             :     if (input_len < expected_tok_len) return nullptr;
             :     string token_str(input_str, expected_tok_len);
             :     boost::to_upper(token_str);
             :     D
> You still have to check if (input_len < expected_tok_len) and return nullpt
Thx for spotting this! Done


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-tokenizer.h
File be/src/runtime/datetime-iso-sql-format-tokenizer.h:

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-tokenizer.h@42
PS3, Line 42:  void Reset(DateTimeFormatContext* dt_ctx, CastDirection cast_mode, bool time_toks) {
            :     dt_ctx_ = dt_ctx;
            :     cast_mode_ = cast_mode;
            :     accept_time_toks_ = time_toks;
            :    
> Is this function called from anywhere? If not, please remove it.
I used to call it heavily from a function that I had dropped since. :D I call it now once in parse-timestamp-benchmarks.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.h
File be/src/runtime/datetime-iso-sql-format-tokenizer.h:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.h@51
PS2, Line 51: Tokenize
> I see. Generally speaking I think it is better to cover as much functionali
Thanks! I'll keep this in mind. For now as I've put some efforts to have these E2E tests I'd keep them as they are.

On a side note, I found our BE tests not that easy to debug to be honest. In case some of them fails it's quite hard to spot the rotten on from the thousands of asserts that we have within one tests. The output text doesn't help either.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@87
PS2, Line 87:   FormatTokenizationR
> I think in this scenario L83 would fail on the second call, wouldn't it?
Indeed. I move this to Reset() then. Additionally, I'll merge Tokenize() and TokenizeImpl() as the separation is no longer needed.


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@105
PS3, Line 105: enizationResult DateTim
> nit: Should be (current_pos != nullptr && *current_pos != nullptr) to be re
Done


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@125
PS3, Line 125:    *current_pos += curr_token_size;
> No need to create a new temp string for each check. Just use 'longest_possi
Done


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-simple-date-format-parser.h
File be/src/runtime/datetime-simple-date-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-simple-date-format-parser.h@91
PS3, Line 91:  t
> nit: double space.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Jul 2019 14:15:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 2:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@114
PS2, Line 114: ProcessNextToken
Mention in a comment that this function is doing greedy-matching and finds the longest matching token.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@122
PS2, Line 122: auto
const auto


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@140
PS2, Line 140: auto
const auto


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/timestamp-parse-util.h@67
PS2, Line 67: If it failed then
nit: Otherwise


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/timestamp-parse-util.cc@43
PS2, Line 43:   
nit: Double space.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/timestamp-parse-util.cc@43
PS2, Line 43:   
nit:double space.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/timestamp-parse-util.cc@130
PS2, Line 130: boost::posix_time::time_duration
No need for a fully qualified name, 'time_duration' is available in this namespace.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/timestamp-parse-util.cc@150
PS2, Line 150: int day_offset = AdjustWithTimezone(t, dt_result.tz_offset);
Before this change, timezone-adjustment was done only when dt_ctx.has_time_toks was set. Any reason you changed this behavior? Isn't this a breaking change?


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/timestamp-parse-util.cc@231
PS2, Line 231: tok.len != 4
tok.len < 4


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/timestamp-parse-util.cc@232
PS2, Line 232:          int adjust_factor = std::pow(10, tok.len);
             :           num_val %= adjust_factor;
Isn't this a breaking change. "% 100" vs "% adjust_factor" ?


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/service/impala-server.cc@1119
PS2, Line 1119: query_ctx->__set_now_string(query_ctx->client_request.query_options.now_string);
Is this query option used for testing only? Please add a comment to explain.


http://gerrit.cloudera.org:8080/#/c/13722/2/common/thrift/Exprs.thrift
File common/thrift/Exprs.thrift:

http://gerrit.cloudera.org:8080/#/c/13722/2/common/thrift/Exprs.thrift@144
PS2, Line 144: TCastExpr
Consider renaming to TCastFormatExpr or something similar to emphasize that this is only used for cast expressions with a format clause.


http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/main/cup/sql-parser.cup@2937
PS2, Line 2937: cast_format_expr
Consider renaming this to 'cast_format_val'


http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java:

http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java@3088
PS2, Line 3088:  public void TestCastFormatClauseFromTimestamp() throws AnalysisException {
              :     String to_timestamp_cast = "cast('05-01-2017' as timestamp)";
              :     AnalysisError("select cast("+to_timestamp_cast+" as DATETIME FORMAT 'MM-dd-yyyy')",
              :         "Unsupported data type: DATETIME");
              :     AnalysisError("select cast("+to_timestamp_cast+" as DATE FORMAT 'MM-dd-yyyy')",
              :         "FORMAT clause is not applicable from TIMESTAMP to DATE");
              :     AnalysisError("select cast("+to_timestamp_cast+" AS INT FORMAT 'MM-dd-yyyy')",
              :         "FORMAT clause is not applicable from TIMESTAMP to INT");
              :     AnalysisError("select cast("+to_timestamp_cast+" AS BOOLEAN FORMAT 'MM-dd-yyyy')",
              :         "FORMAT clause is not applicable from TIMESTAMP to BOOLEAN");
              :     AnalysisError("select cast("+to_timestamp_cast+" AS DOUBLE FORMAT 'MM-dd-yyyy')",
              :         "FORMAT clause is not applicable from TIMESTAMP to DOUBLE");
              :     AnalysisError("select cast("+to_timestamp_cast+" AS STRING FORMAT '')",
              :         "FORMAT clause can't be empty");
              :     AnalyzesOk("select cast("+to_timestamp_cast+" AS STRING FORMAT 'MM-dd-yyyy')");
              :     AnalyzesOk("select cast("+to_timestamp_cast+" AS VARCHAR FORMAT 'MM-dd-yyyy')");
              :     AnalyzesOk("select cast("+to_timestamp_cast+" AS CHAR(10) FORMAT 'MM-dd-yyyy')");
              :   }
Would it make sense to add similar tests for DATE instead of TIMESTAMP?


http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java@3109
PS2, Line 3109:     String cast_str = "CAST('05-01-2017' AS TIMESTAMP FORMAT 'MM-dd-yyyy')";
              :     SelectStmt select = (SelectStmt) AnalyzesOk("select " + cast_str);
              :     Assert.assertEquals(cast_str, select.getResultExprs().get(0).toSqlImpl());
Please add similar test for DATE instead of TIMESTAMP.


http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

http://gerrit.cloudera.org:8080/#/c/13722/2/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@1458
PS2, Line 1458:     ParsesOk("select cast('05-01-2017' as timestamp format 'MM-dd-yyyy')");
              :     ParserError("select cast(a + 5.0 as badtype) from t");
              :     ParserError("select cast(a + 5.0, string) from t");
              :     ParserError("select cast('05-01-2017' as timestamp format 12345)");
              :     ParserError("select cast('05-01-2017' as timestamp format )");
Please add similar tests for DATE.



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 02 Jul 2019 15:29:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 14:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4853/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 14
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Aug 2019 10:09:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 12:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4123/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 12
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 02 Aug 2019 11:44:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/exprs/cast-functions-ir.cc
File be/src/exprs/cast-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/exprs/cast-functions-ir.cc@181
PS17, Line 181:     int buf_len = format_ctx->fmt_out_len + 1;
> There are tokens where the input they accept is actually longer than the le
Is there a reasonable compile time upper limit to buf_len? If yes, please add a DCHECK; if no, allocate buf on the heap instead.



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 19
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Sep 2019 07:49:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/13722/5/fe/src/main/java/org/apache/impala/analysis/CastExpr.java
File fe/src/main/java/org/apache/impala/analysis/CastExpr.java:

http://gerrit.cloudera.org:8080/#/c/13722/5/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@201
PS5, Line 201: targetTypeDef_ is null
'targetTypeDef_' and 'castFormat_' are null.


http://gerrit.cloudera.org:8080/#/c/13722/5/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@267
PS5, Line 267: isIntermediateStepNeeded
I find the naming here is a bit confusing.

E.g.: TIMESTAMP -> CHAR casting is broken down to two steps:
1. TIMESTAMP -> STRING
2. STRING -> CHAR

Which one is the intermediate step?

Maybe it should be called 'twoStepCastNeeded', or something similar.


http://gerrit.cloudera.org:8080/#/c/13722/5/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@278
PS5, Line 278:       children_.set(0, tostring);
Probably 'castFormat_' should be set to null after this. In the STRING -> CHAR cast we don't need the format string anymore.

This would simplify the condition in L201, if (null != castFormat_) would suffice.

Also in L229, if (null != castFormat_) would be enough.


http://gerrit.cloudera.org:8080/#/c/13722/5/testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
File testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test:

http://gerrit.cloudera.org:8080/#/c/13722/5/testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test@109
PS5, Line 109: ====
> Please make sure that the following scenarios are also tested:
also, casting NULL timestamp to string should be tested.


http://gerrit.cloudera.org:8080/#/c/13722/5/testdata/workloads/functional-query/queries/QueryTest/date.test
File testdata/workloads/functional-query/queries/QueryTest/date.test:

http://gerrit.cloudera.org:8080/#/c/13722/5/testdata/workloads/functional-query/queries/QueryTest/date.test@857
PS5, Line 857: ====
> Please make sure that the following scenarios are also tested:
also, casting NULL date to string should be tested.



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Jul 2019 10:24:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 10:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/exprs/cast-expr.h
File be/src/exprs/cast-expr.h:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/exprs/cast-expr.h@1
PS10, Line 1: // Licensed to the Apache Software Foundation (ASF) under one
Source file should be renamed to cast-format-expr.h


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/exprs/cast-expr.h@44
PS10, Line 44: mutable std::unique_ptr<datetime_parse_util::DateTimeFormatContext> dt_ctx_;
Does 'dt_ctx_' have to be a pointer?

I think it would be simpler if CastFormatExpr directly contained an instance of DateTimeFormatContext.


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-parser.cc@30
PS10, Line 30: using std::string;
In other .cc files, we just include common/names.h which pulls in all the usual std stuff. No need to specify them one by one.


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-parser.cc@134
PS10, Line 134:        string indicator(current_pos, token_len);
              :         boost::to_upper(indicator);
              :         if (indicator == "PM" || indicator == "P.M.") result->hour += 12;
              :         break;
Add a comment that the token has already been validated in GetNextTokenFromInput/ParseMerifdiemIndicatorFromInput.


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-parser.cc@145
PS10, Line 145:       case ISO8601_ZULU_INDICATOR: {
Add DCHECK(token_len == 1)


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-parser.cc@146
PS10, Line 146: std::
std qualifier is not necessary, here and elsewhere in the .cc file.


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-parser.cc@205
PS10, Line 205: GetNextTokenFromInput
This should be called 'FindEndOfToken' or something similar.


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-parser.cc@239
PS10, Line 239:   for (int expected_tok_len : {2, 4}) {
              :     if (input_len < expected_tok_len) return nullptr;
              :     string token_str(input_str, expected_tok_len);
              :     boost::to_upper(token_str);
              :     DateTimeFormatTokenType token_type;
              :     if (IsoSqlFormatTokenizer::GetTokenType(token_str, &token_type) &&
              :         token_type == MERIDIEM_INDICATOR) {
              :       return input_str + expected_tok_len;
              :     }
              :   }
This looks a bit complicated for what the function does. 
Maybe something like this would be simpler and faster:

if (input_len >= 4
    && (strncasecmp(input_str, AM_LONG.first, 4) == 0
        || strncasecmp(input_str, PM_LONG.first, 4) == 0 ))) {
  return input_str + 4;
} else if (input_len >= 2
    && (strncasecmp(input_str, AM.first, 2) == 0
        || strncasecmp(input_str, PM.first, 2) == 0 ))) {
  return input_str + 2;
} else {
  return nullptr;
}


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h
File be/src/runtime/datetime-iso-sql-format-tokenizer.h:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h@53
PS10, Line 53:   /// Searches for 'token' in VALID_TOKENS and sets the corresponding token type in
             :   /// 'token_type' parameter if found. Returns true if 'token' is a valid key in
             :   /// VALID_TOKENS.
             :   static bool GetTokenType(const std::string& token, DateTimeFormatTokenType* token_type)
             :       WARN_UNUSED_RESULT;
This function is only called in IsoSqlFormatParser::ParseMeridiemIndicatorFromInput().

After simplifying ParseMeridiemIndicatorFromInput() probably this can be removed too.


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@118
PS10, Line 118: IsUsedToken(token_to_probe) && cast_mode_ == PARSE
nit: Reverse the order of && operands to prevent looking up the token if tokenizer is in FORMAT mode.


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@124
PS10, Line 124: token_to_probe
Maybe it would be simpler to keep track of the DateTimeFormatTokenType of the tokens we have encountered (token->second.type) instead of their string representation.

That would greatly simplify CheckIncompatibilities() too.


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@124
PS10, Line 124: used_tokens_.insert(token_to_probe);
This probably not necessary if tokenizer is in FORMAT mode.


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-parser-common.h
File be/src/runtime/datetime-parser-common.h:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-parser-common.h@223
PS10, Line 223: from
nit: since


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-parser-common.h@224
PS10, Line 224: if any of the return values are
"if any of the in parameters are"


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/datetime-parser-common.h@225
PS10, Line 225: 'day'=366.
"days_since_jan1 set to 365" ?


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/timestamp-parse-util.cc@45
PS10, Line 45: static bool IndicateTimestampParseFailure(date* d, time_duration* t) {
DCHECK(d != nulptr);
DCHECK(t != nullptr);


http://gerrit.cloudera.org:8080/#/c/13722/10/be/src/runtime/timestamp-parse-util.cc@188
PS10, Line 188: NULL
nullptr



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 24 Jul 2019 14:15:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 21: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 21
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 10 Sep 2019 17:03:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 22:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4583/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 22
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Sep 2019 09:25:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 15:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13722/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13722/15//COMMIT_MSG@25
PS15, Line 25: - YYYY, YYY, YY, Y: Year tokens
             : - RRRR, RR: Round year tokens
             : - MM: Month (1-12)
             : - DD: Day (1-31)
             : - DDD: Day of year (1-366)
             : - HH, HH12: Hour of day (1-12)
             : - HH24: Hour of day (0-23)
             : - MI: Minute (0-59)
             : - SS: Second (0-59)
             : - SSSSS: Second of day (0-86399)
             : - FF, FF1, ..., FF9: Fractional second
             : - AM, PM, A.M., P.M.: Meridiem indicators
             : - TZH: Timezone hour (-99-+99)
             : - TZM: Timezone minute (0-99)
Thanks for adding these. Are there any tests  for these lower/upper limits?


http://gerrit.cloudera.org:8080/#/c/13722/15/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/15/be/src/runtime/datetime-iso-sql-format-parser.cc@144
PS15, Line 144: 0, 99
For TIMEZONE_MIN the valid range is 0...99, but for TIMEZONE_HOUR it should be -99...+99


http://gerrit.cloudera.org:8080/#/c/13722/15/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/13722/15/tests/query_test/test_cast_with_format.py@502
PS15, Line 502: 01:-59
Is this is parsed because ":-" characters in the input string are matched to ':' separator in the format string?



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 15
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 29 Aug 2019 15:02:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 13:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4771/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 13
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 12 Aug 2019 08:47:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 13: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 13
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 12 Aug 2019 13:38:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3893/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Jul 2019 08:54:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 21:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4922/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 21
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 10 Sep 2019 12:56:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/13722/1/be/src/runtime/timestamp-test.cc
File be/src/runtime/timestamp-test.cc:

http://gerrit.cloudera.org:8080/#/c/13722/1/be/src/runtime/timestamp-test.cc@658
PS1, Line 658:         TimestampValue::ParseSimpleDateFormat(test_case.str, strlen(test_case.str), dt_ctx);
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/13722/1/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/13722/1/tests/query_test/test_cast_with_format.py@18
PS1, Line 18: import pytest
flake8: F401 'pytest' imported but unused


http://gerrit.cloudera.org:8080/#/c/13722/1/tests/query_test/test_cast_with_format.py@410
PS1, Line 410: +
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/13722/1/tests/query_test/test_cast_with_format.py@418
PS1, Line 418: +
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/13722/1/tests/query_test/test_cast_with_format.py@548
PS1, Line 548: #
flake8: E265 block comment should start with '# '


http://gerrit.cloudera.org:8080/#/c/13722/1/tests/query_test/test_cast_with_format.py@553
PS1, Line 553: #
flake8: E265 block comment should start with '# '


http://gerrit.cloudera.org:8080/#/c/13722/1/tests/query_test/test_cast_with_format.py@648
PS1, Line 648: "
flake8: E225 missing whitespace around operator



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Jun 2019 14:44:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 17:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/exprs/cast-format-expr.h
File be/src/exprs/cast-format-expr.h:

http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/exprs/cast-format-expr.h@35
PS17, Line 35: virtual
> nit: 'virtual' keyword is not necessary.
Done


http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/exprs/cast-format-expr.h@37
PS17, Line 37: virtual
> nit: same
Done


http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/exprs/cast-format-expr.h@46
PS17, Line 46: FunctionContext
> const
Done


http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/exprs/cast-format-expr.cc
File be/src/exprs/cast-format-expr.cc:

http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/exprs/cast-format-expr.cc@64
PS17, Line 64: FunctionContext
> const FunctionContext*
Done


http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/exprs/cast-functions-ir.cc
File be/src/exprs/cast-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/exprs/cast-functions-ir.cc@181
PS17, Line 181:     int buf_len = format_ctx->fmt_out_len + 1;
> DCHECK(buf_len <= IsoSqlFormatTokenizer::MAX_FORMAT_LENGTH);
There are tokens where the input they accept is actually longer than the length of the token. e.g. MONTH, DAY, DY, FF[4-9] AM, PM. So limiting the input string length not to be longer than the max length of the format string won't be correct.


http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/exprs/cast-functions-ir.cc@203
PS17, Line 203:     int buf_len = format_ctx->fmt_out_len + 1;
> DCHECK(buf_len <= IsoSqlFormatTokenizer::MAX_FORMAT_LENGTH);
Same as above.


http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/13722/17/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@110
PS17, Line 110:  DCHECK(*current_pos <= str_end);
> Please revisit this DCHECK and the one in L113.
On the callsite I prevent calling this function when current_pos == str_end so I modified the DCHECK in this line.



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 17
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Sep 2019 15:37:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13722

to look at the new patch set (#21).

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................

IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat. Contrary to how these
functions work the FORMAT clause must specify a string literal and
cannot be used with any other kind of a string expression.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month (1-12)
- DD: Day (1-31)
- DDD: Day of year (1-366)
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute (0-59)
- SS: Second (0-59)
- SSSSS: Second of day (0-86399)
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour (-99-+99)
- TZM: Timezone minute (0-99)
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-format-expr.cc
A be/src/exprs/cast-format-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/date-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/date-parse-util.cc
M be/src/runtime/date-parse-util.h
M be/src/runtime/date-test.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
A be/src/runtime/datetime-iso-sql-format-parser.cc
A be/src/runtime/datetime-iso-sql-format-parser.h
A be/src/runtime/datetime-iso-sql-format-tokenizer.cc
A be/src/runtime/datetime-iso-sql-format-tokenizer.h
D be/src/runtime/datetime-parse-util.h
A be/src/runtime/datetime-parser-common.cc
A be/src/runtime/datetime-parser-common.h
R be/src/runtime/datetime-simple-date-format-parser.cc
A be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/testutil/random-vector-generators.h
M be/src/util/dict-test.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/string-parser.h
M common/thrift/Exprs.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
M testdata/workloads/functional-query/queries/QueryTest/date.test
A tests/query_test/test_cast_with_format.py
54 files changed, 3,530 insertions(+), 964 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13722/21
-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 21
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13722

to look at the new patch set (#5).

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................

IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month
- DD: Day
- DDD: Day of year
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute
- SS: Second
- SSSSS: Second of day
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour
- TZM: Timezone minute
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-expr.cc
A be/src/exprs/cast-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/date-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/date-parse-util.cc
M be/src/runtime/date-parse-util.h
M be/src/runtime/date-test.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
A be/src/runtime/datetime-iso-sql-format-parser.cc
A be/src/runtime/datetime-iso-sql-format-parser.h
A be/src/runtime/datetime-iso-sql-format-tokenizer.cc
A be/src/runtime/datetime-iso-sql-format-tokenizer.h
D be/src/runtime/datetime-parse-util.h
A be/src/runtime/datetime-parser-common.cc
A be/src/runtime/datetime-parser-common.h
R be/src/runtime/datetime-simple-date-format-parser.cc
A be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/testutil/random-vector-generators.h
M be/src/util/dict-test.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/string-parser.h
M common/thrift/Exprs.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
M testdata/workloads/functional-query/queries/QueryTest/date.test
A tests/query_test/test_cast_with_format.py
54 files changed, 3,333 insertions(+), 853 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13722/5
-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 23: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 23
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 19 Sep 2019 13:58:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13722/15/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/15/be/src/runtime/datetime-iso-sql-format-parser.cc@144
PS15, Line 144: 
> Good catch this is actually a bug! In fact I think something is off with th
Fixed in PS16



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 16
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Sep 2019 09:04:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 14:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/13722/13//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13722/13//COMMIT_MSG@24
PS13, Line 24: List of tokens implemented by this change:
             : - YYYY, YYY, YY, Y: Year tokens
             : - RRRR, RR: Round year tokens
             : - MM: Month
             : - DD: Day
             : - DDD: Day of year
             : - HH, HH12: Hour of day (1-12)
             : - HH24: Hour of day (0-23)
             : - MI: Minute
             : - SS: Second
             : - SSSSS: Second of day
             : - FF, FF1, ..., FF9: Fractional second
             : - AM, PM, A.M., P.M.: Meridiem indicators
             : - TZH: Timezone hour
             : - TZM: Timezone minute
             : - Separators: - . / , ' ; : space
             : - ISO8601 date indicators (T, Z)
Please specify the allowed range of values for every token.


http://gerrit.cloudera.org:8080/#/c/13722/13/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/13/be/src/runtime/datetime-iso-sql-format-parser.cc@142
PS13, Line 142:  // Deliberately ignore the timezone offsets.
              :         break;
Does this mean that we accept any token as a timezone hour/min? Shouldn't we check at least that the token is a number?


http://gerrit.cloudera.org:8080/#/c/13722/13/be/src/runtime/datetime-iso-sql-format-parser.cc@196
PS13, Line 196:   // The last '-' of a separator sequence might be taken as a sign for timezone hour.
              :   if (*(*current_pos - 1) == '-' && dt_ctx.toks[*current_tok_idx].type == TIMEZONE_HOUR) {
              :     --(*current_pos);
              :   }
How about TIMEZONE_MIN? Is it possible to specify a negative TIMEZONE_MIN without TIMEZONE_HOUR?


http://gerrit.cloudera.org:8080/#/c/13722/13/be/src/runtime/datetime-iso-sql-format-parser.cc@219
PS13, Line 219:  const char* start_of_token = input_str;
              :   if (tok.type == TIMEZONE_HOUR) {
              :     if (max_tok_len > 2) max_tok_len = 2;
              :     if (*start_of_token == '-' || *start_of_token == '+') ++start_of_token;
              :   }
Should we do the same for TIMEZONE_MIN as well?



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 14
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Aug 2019 13:10:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 11:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/4112/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 11
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Aug 2019 14:43:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13722

to look at the new patch set (#12).

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................

IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat. Contrary to how these
functions work the FORMAT clause must specify a string literal and
cannot be used with any other kind of a string expression.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month
- DD: Day
- DDD: Day of year
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute
- SS: Second
- SSSSS: Second of day
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour
- TZM: Timezone minute
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-format-expr.cc
A be/src/exprs/cast-format-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/date-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/date-parse-util.cc
M be/src/runtime/date-parse-util.h
M be/src/runtime/date-test.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
A be/src/runtime/datetime-iso-sql-format-parser.cc
A be/src/runtime/datetime-iso-sql-format-parser.h
A be/src/runtime/datetime-iso-sql-format-tokenizer.cc
A be/src/runtime/datetime-iso-sql-format-tokenizer.h
D be/src/runtime/datetime-parse-util.h
A be/src/runtime/datetime-parser-common.cc
A be/src/runtime/datetime-parser-common.h
R be/src/runtime/datetime-simple-date-format-parser.cc
A be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/testutil/random-vector-generators.h
M be/src/util/dict-test.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/string-parser.h
M common/thrift/Exprs.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
M testdata/workloads/functional-query/queries/QueryTest/date.test
A tests/query_test/test_cast_with_format.py
54 files changed, 3,425 insertions(+), 865 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13722/12
-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 12
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 15:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4414/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 15
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 28 Aug 2019 13:47:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 4:

(31 comments)

> Did some benchmarking on the IsoSql parsing and there was one thing
 > that caused a decent performance drop compared to the SimpleDate
 > format version:
 > - For SimpleDateFormat everything is a separator that is not a
 > digit, so checking the end of a pattern section is done via calling
 > isdigit(). On the other hand my implementation had an unordered_set
 > to contain the separator characters and for each character in the
 > input I made a lookup in this set. Apparently isdigit() outperforms
 > the set implementation so I made some massaging on the
 > IsSeparator() function.
 > - The improvement was to get rid of the unordered_set for
 > separators and simply do comparisons on hard-coded characters
 > within IsSeparator(). This gave a significant performance
 > improvement. (Still doesn;t reach the efficiency of isdigit())
 > 
 > Still the SimpleDateFormat implementation has some performance
 > advantage over the IsoSql implementation but this is due to the
 > fact that the latter offers more flexibility:
 > - The length of the separator sequences is flexible (matching is
 > not strict char by char).
 > - There is a defined set of characters that can serve as a
 > separator (not taking everything non-digit as separator).
 > I feel that taking these extra functionalities into account the
 > performance difference is reasonable and acceptable.

Thanks for the improvements and the analysis!

http://gerrit.cloudera.org:8080/#/c/13722/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13722/2//COMMIT_MSG@49
PS2, Line 49: In a string type to timestamp conversion the timezone offset tokens
            :   are parsed, expected to match with the input but they don't adjust
            :   the result as the input is already expected to be in UTC format.
> Not really since e.g. Oracle have different types for timestamp with timezo
ok, thanks for the explanation.


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/benchmarks/parse-timestamp-benchmark.cc
File be/src/benchmarks/parse-timestamp-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/benchmarks/parse-timestamp-benchmark.cc@46
PS3, Line 46: //
nit: put a space after // in this line and below.


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/benchmarks/parse-timestamp-benchmark.cc@64
PS3, Line 64: //  ImpalaSimpleDateFormatTZTimeStamp               16.2     16.6     17.2      67.2X      65.3X      66.3X
            : //        ImpalaIsoSqlFormatTimeStamp 
Maybe add 'ImpalaIsoSqlFormatTZTimestamp' to this micro-benchmark?


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/benchmarks/parse-timestamp-benchmark.cc@134
PS3, Line 134: TestImpalaDate
I think this should be called 'TestImpalaSimpleDateFormat' for consistency (or something similar).


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/exprs/cast-functions-ir.cc
File be/src/exprs/cast-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/exprs/cast-functions-ir.cc@172
PS3, Line 172: 
Add 'const' specifier to the type. Here and below in L192, L309, L344.


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/date-test.cc
File be/src/runtime/date-test.cc:

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/date-test.cc@36
PS3, Line 36: using namespace datetime_parse_util;
I think, this test should cover the new iso-sql date parser as well.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@141
PS2, Line 141:         if (!ParseAndValidate(current_pos, group_len, 0, 86399, &second_in_day)) {
             :           return false;
             :         }
             :         result->second = second_in_day % 60;
             :         int minutes_in_day = second_in_day / 60;
             :         result->minute = minutes_in_day % 60;
             :         result->hour = minutes_in_day / 60;
             :         break;
             :       }
> The idea was to only move logic to functions that are used both by SimpleDa
Ok, I get it.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-parser.cc@219
PS2, Line 219: DCHECK(input_str != nullptr);
> No need to check if 'input_len' > 2 as the length of a  TIMEZONE_HOUR is by
What happens if the input string is invalid?

e.g.: At the end of the input string where the TIMEZONE_HOUR token should be, the input string has just one character (a digit). In this case GetNextTokenGroupFromInput() is called with input_len = 1, tok.type = TIMEZONE_HOUR and tok.len = 3. In L219 we set input_len to 2. Then in L225 we may over-read the buffer.


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@43
PS3, Line 43: i < dt_ctx.toks.size()
(current_pos < end_pos) should be checked here too. 

It the input string is invalid and contains less tokens than dt_ctx.toks, then we might end up with a current_pos == end_pos situation, which will cause a buffer over-read in L46.


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@49
PS3, Line 49: current_pos - input_str < input_len
current_pos < end_pos


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@72
PS3, Line 72: (int)(end_pos - current_pos)
The result of end_pos - current_pos is already int, isn't it?  No need to explicitly cast to int.


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@103
PS3, Line 103: if (!ParseAnd
Why not use 'day_in_year' instead directly? No need to introduce a new interim variable.


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@116
PS3, Line 116: if (!ParseAndValidate
I think, you should clear 'result' at the beginning of the ParseDateTime() function to make sure that it starts with result->hour set to 0.

Otherwise the calculated hour will be wrong here.


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@168
PS3, Line 168:   break;
current_pos < end_pos || i < dt_ctx.toks.size()


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@176
PS3, Line 176: 
Please double check that this will work and not throw an exception for years that are not supported by boost (year 0000..1399).


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@186
PS3, Line 186: 
result->month >= 1 && result->month <= 12


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@187
PS3, Line 187: 
result->day >= 1 && result->day <= 31


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@196
PS3, Line 196:   // Handle separately the meridiem indicators for two reasons.
Some additional checks can't hurt:

DCHECK(input_len >= 0);
if (input_len == 0) return nullptr;


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@205
PS3, Line 205: 
             :   const char* end_pos = input_str;
             :   int len = 0;
             :   while (len < input_len && len < tok.len) {
             :     ++len;
             :     ++end_pos;
             :     if (DateTimeIsoSqlFormatTokenizer::IsSeparator(*end_pos)) break;
             :   }
             :   if (end_pos == input_str) return nullptr;
             :   return end_pos;
             : }
I find this code a bit confusing. I think the loop has an indexing problem which may cause a buffer over-read in L212. E.g.: If input_len and tok.len are both 1, then it should check *input_str character only, but it checks *(input_str + 1).

Maybe this would be a bit cleaner:

const char* next_pos = input_str;
int max_tok_len = min(input_len, tok.len);

if (tok.type == TIMEZONE_HOUR) {
  if (*next_pos == '-' || *next_pos == '+') {
    ++next_pos;
  } else if (max_tok_len > 2) {
    max_tok_len = 2;
  }
}
const char* end_pos = input_str + max_tok_len;
while (next_pos < end_pos && !IsSeparator(*next_pos)) {
  ++next_pos;
}

if (next_pos == input_str) return nullptr;
return next_pos;


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@222
PS3, Line 222:     boost::to_upper(token_str);
             :     DateTimeFormatTokenType token_type;
             :     if (DateTimeIsoSqlFormatTokenizer::GetTokenType(token_str, &token_type) &&
             :         token_type == MERIDIEM_INDICATOR) {
             :       return input_str + expected_tok_len;
             :     }
             :   }
You still have to check if (input_len < expected_tok_len) and return nullptr if it is.


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-tokenizer.h
File be/src/runtime/datetime-iso-sql-format-tokenizer.h:

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-tokenizer.h@42
PS3, Line 42:  void Reset(DateTimeFormatContext* dt_ctx, CastDirection cast_mode, bool time_toks) {
            :     dt_ctx_ = dt_ctx;
            :     cast_mode_ = cast_mode;
            :     accept_time_toks_ = time_toks;
            :   }
Is this function called from anywhere? If not, please remove it.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.h
File be/src/runtime/datetime-iso-sql-format-tokenizer.h:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.h@51
PS2, Line 51: Tokenize
> I didn't write BE tests for this specific function as this is basically tes
I see. Generally speaking I think it is better to cover as much functionality as possible in the BE tests and write only a few common-sense E2E tests. BE tests are faster to execute and easier to debug if something fails.

Anyway, this is just a suggestion. If you don't want to do it you don't have to.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@87
PS2, Line 87: used_tokens_.clear();
> Nothing serious: Doing it here the user can run Tokenize() multiple times o
I think in this scenario L83 would fail on the second call, wouldn't it?


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@105
PS3, Line 105: *current_pos != nullptr
nit: Should be (current_pos != nullptr && *current_pos != nullptr) to be really precise :)


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@125
PS3, Line 125:  string curr_token(longest_possible_token, 0, curr_token_size);
No need to create a new temp string for each check. Just use 'longest_possible_token' instead and do longest_possible_token.resize(curr_token_size) after L138.

Also rename 'longest_possible_token' to something like 'curr_token'.


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-simple-date-format-parser.h
File be/src/runtime/datetime-simple-date-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-simple-date-format-parser.h@91
PS3, Line 91:   
nit: double space.


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-simple-date-format-parser.cc
File be/src/runtime/datetime-simple-date-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-simple-date-format-parser.cc@372
PS2, Line 372: DateTimeSimpleDateForma
> For ISO SQL parsing there is nothing like default context as there is alway
Got it, thx!


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/datetime-simple-date-format-parser.cc@488
PS2, Line 488:      }
             :       case MONTH_IN_YEAR_SLT: {
             :         char raw_buff[tok.len];
             :         std::transform(tok_val, tok_val + tok.len, raw_buff, ::tolower);
             :         StringValue buff(raw_buff, tok.len);
             :         boost::unordered_map<StringValue, int>::const_iterator iter =
             :             REV_MONTH_INDEX.find(buff);
             :         if (UNLIKELY(iter == REV_MONTH_INDEX.end())) return false;
             :         dt_res
> I could reply the same here as I had for the same in the ISO version: I wan
Got it, thx!


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-simple-date-format-parser.cc
File be/src/runtime/datetime-simple-date-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-simple-date-format-parser.cc@361
PS3, Line 361:  if (num_digits == 0) return false;
Good fix, thx!


http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/13722/2/be/src/runtime/timestamp-parse-util.cc@232
PS2, Line 232:        if (tok.len < 4) {
             :           int adjust_factor = std::
> I would call it a bugfix as previously round year didn't work well for all 
Ok, thx!


http://gerrit.cloudera.org:8080/#/c/13722/2/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/13722/2/tests/query_test/test_cast_with_format.py@22
PS2, Line 22: TestCastWithFormat
> There was one reason I couldn't do that: The tests in cast_format.test are 
Got it, that's a weird problem. I guess, you could work around it with some regex magic in the .test file, but that is probably too much trouble.



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Jul 2019 18:34:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 24: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 24
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 19 Sep 2019 18:46:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3878/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 Jul 2019 12:54:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13722/6/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/6/be/src/runtime/datetime-iso-sql-format-parser.cc@195
PS6, Line 195: 
if (input_len == 0) return nullptr;

The check at the beginning of the function is still needed, otherwise we will have an over-read problem in L208. If input_len is 0, we shouldn't access *input_str.


http://gerrit.cloudera.org:8080/#/c/13722/6/be/src/runtime/datetime-iso-sql-format-parser.cc@211
PS6, Line 211:   int len = 1;
             :   const char* end_pos = start_of_token;
             :   while (len <= max_tok_len && !DateTimeIsoSqlFormatTokenizer::IsSeparator(*end_pos)) {
             :     ++len;
             :     ++end_pos;
             :   }
'len' is not really necessary:

while (end_pos < start_of_token + max_tok_len && ..) {
  ++end_pos;
}



-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 Jul 2019 12:00:26 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13722

to look at the new patch set (#15).

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................

IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat. Contrary to how these
functions work the FORMAT clause must specify a string literal and
cannot be used with any other kind of a string expression.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month (1-12)
- DD: Day (1-31)
- DDD: Day of year (1-366)
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute (0-59)
- SS: Second (0-59)
- SSSSS: Second of day (0-86399)
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour (-99-+99)
- TZM: Timezone minute (0-99)
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-format-expr.cc
A be/src/exprs/cast-format-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/date-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/date-parse-util.cc
M be/src/runtime/date-parse-util.h
M be/src/runtime/date-test.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
A be/src/runtime/datetime-iso-sql-format-parser.cc
A be/src/runtime/datetime-iso-sql-format-parser.h
A be/src/runtime/datetime-iso-sql-format-tokenizer.cc
A be/src/runtime/datetime-iso-sql-format-tokenizer.h
D be/src/runtime/datetime-parse-util.h
A be/src/runtime/datetime-parser-common.cc
A be/src/runtime/datetime-parser-common.h
R be/src/runtime/datetime-simple-date-format-parser.cc
A be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/testutil/random-vector-generators.h
M be/src/util/dict-test.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/string-parser.h
M common/thrift/Exprs.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
M testdata/workloads/functional-query/queries/QueryTest/date.test
A tests/query_test/test_cast_with_format.py
54 files changed, 3,446 insertions(+), 865 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13722/15
-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 15
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )

Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 20:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4497/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 20
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Sep 2019 15:11:31 +0000
Gerrit-HasComments: No