You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Anonymous Coward (Code Review)" <ge...@cloudera.org> on 2020/01/29 18:59:08 UTC

[Impala-ASF-CR] IMPALA-9280: Time string with signed TZH token preceded by dash cannot be cast to timestamp

riza.suminto@cloudera.com has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15130


Change subject: IMPALA-9280: Time string with signed TZH token preceded by dash cannot be cast to timestamp
......................................................................

IMPALA-9280: Time string with signed TZH token preceded by dash cannot
be cast to timestamp

In ISO SQL datetime format, "-" is both separator character and sign
in TZH token. This change make sure to NOT include the last "-" char
as part of TZH token if the negative sign is the only possible
separator character in the separator substring.

Testing:
* Add test TimestampTest.IsoSqlWithTZ

Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
---
M be/src/runtime/datetime-iso-sql-format-parser.cc
M be/src/runtime/timestamp-test.cc
2 files changed, 38 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/15130/1
-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-9280: Fix parsing of timestamp with dash before TZH

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15130 )

Change subject: IMPALA-9280: Fix parsing of timestamp with dash before TZH
......................................................................


Patch Set 6: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Feb 2020 15:15:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9280: Fix parsing of timestamp with dash before TZH

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/15130 )

Change subject: IMPALA-9280: Fix parsing of timestamp with dash before TZH
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/15130/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15130/4//COMMIT_MSG@11
PS4, Line 11: if the negative sign is the only possible
            : separator character in the separator substring.
I have the same comment here as in the parser: This doesn't cover the whole picture as we also don't include it if the TZH itself has a '+' char as a sign.


http://gerrit.cloudera.org:8080/#/c/15130/4/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/15130/4/be/src/runtime/datetime-iso-sql-format-parser.cc@296
PS4, Line 296: (*current_pos)
nit: no need for the brackets.


http://gerrit.cloudera.org:8080/#/c/15130/4/be/src/runtime/datetime-iso-sql-format-parser.cc@323
PS4, Line 323: // The last '-' of a separator sequence might be taken as a sign for timezone hour.
             :   // UNLESS the last '-' is the only possible separator character in the separator
             :   // substring (in which case it is not counted as the timezone hour's negative sign).
I'd rephrase this comment a bit as it misses to mention that if the TZH starts with a '+' then we again don't take the last '-' of the separator sequence as the sign.

Let's say something like this: "If the TZH token itself doesn't start with a sign and the separator sequence contains more than one separators  then the last '-' of the separator sequence is taken as the sign of TZH".

What do you think?



-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 31 Jan 2020 07:19:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9280: Fix parsing of timestamp with dash before TZH

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15130 )

Change subject: IMPALA-9280: Fix parsing of timestamp with dash before TZH
......................................................................

IMPALA-9280: Fix parsing of timestamp with dash before TZH

In ISO SQL datetime format, '-' is both separator character and sign
in TZH token. If the TZH token itself doesn't start with a '+' sign
and the separator sequence contains more than one separators then the
last '-' of the separator sequence is taken as the sign of TZH.

Testing:
* Add test to query_test/test_cast_with_format.py

Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Reviewed-on: http://gerrit.cloudera.org:8080/15130
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/runtime/datetime-iso-sql-format-parser.cc
M tests/query_test/test_cast_with_format.py
2 files changed, 10 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified
  Tim Armstrong: Looks good to me, but someone else must approve

-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-9280: Fix parsing of timestamp with dash before TZH

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15130 )

Change subject: IMPALA-9280: Fix parsing of timestamp with dash before TZH
......................................................................


Patch Set 2:

(3 comments)

Thank you for the initial review. Patch set 2 submitted.

http://gerrit.cloudera.org:8080/#/c/15130/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15130/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-9280: Fix parsing of timestamp with dash before TZH
> nit: try to make this fit on a single line. Usually the title also describe
Done


http://gerrit.cloudera.org:8080/#/c/15130/1/be/src/runtime/timestamp-test.cc
File be/src/runtime/timestamp-test.cc:

http://gerrit.cloudera.org:8080/#/c/15130/1/be/src/runtime/timestamp-test.cc@1074
PS1, Line 1074:   // Test handling of '-' separator right before TZH.
> Maybe add an additional comment like "All of these time strings should succ
Done


http://gerrit.cloudera.org:8080/#/c/15130/1/be/src/runtime/timestamp-test.cc@1094
PS1, Line 1094:     TimestampValue v = TimestampValue::ParseIsoSqlFormat(
> Can we also check expected values for the parsed timestamps, to guard again
I change the test to reuse TimestampTC struct like written in TimestampTest.Basic. And then check the hour, and minute part to verify.



-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Jan 2020 22:50:38 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9280: Fix parsing of timestamp with dash before TZH

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15130 )

Change subject: IMPALA-9280: Fix parsing of timestamp with dash before TZH
......................................................................


Patch Set 2:

(4 comments)

Thank you Gabor for your explanation.

http://gerrit.cloudera.org:8080/#/c/15130/2/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/15130/2/be/src/runtime/datetime-iso-sql-format-parser.cc@297
PS2, Line 297:   int cur_pos_move_count = 0;
> I think the algorithm used here could maybe be simplified. I don't want to 
Ack


http://gerrit.cloudera.org:8080/#/c/15130/2/be/src/runtime/datetime-iso-sql-format-parser.cc@332
PS2, Line 332: && cur_pos_move_count > cur_tok_move_count && *(*current_pos) != '+'
> Counting the processed separator tokens doesn't seem the best approach for 
What I understand from your explanation, the number of separator chars in the time string does not need to be as much as separator chars in the pattern. What matter is there should be at least one separator char in the time string.

Therefore, given a pattern "HH:MI-TZH":
a) "08:00-01" is equal to "08:00 +01"
b) "08:00-+01" is equal to "08:00 +01"
c) "08:00--01" is equal to "08:00 -01"
d) "08:00---01" is equal to "08:00 -01"

And all above is also true for pattern "HH:MI--TZH", is that correct?


http://gerrit.cloudera.org:8080/#/c/15130/2/be/src/runtime/timestamp-test.cc
File be/src/runtime/timestamp-test.cc:

http://gerrit.cloudera.org:8080/#/c/15130/2/be/src/runtime/timestamp-test.cc@1071
PS2, Line 1071: TEST(TimestampTest, IsoSqlWithTZ) {
> Almost all of these formatting tests are in tests/query_test/test_cast_with
Ack


http://gerrit.cloudera.org:8080/#/c/15130/2/be/src/runtime/timestamp-test.cc@1078
PS2, Line 1078: HH:MI-TZH
> As far as I know we don't really support dateless timestamps even though Im
Will follow example from tests/query_test/test_cast_with_format.py



-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Jan 2020 17:52:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9280: Fix parsing of timestamp with dash before TZH

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15130 )

Change subject: IMPALA-9280: Fix parsing of timestamp with dash before TZH
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5287/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Feb 2020 10:23:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9280: Fix parsing of timestamp with dash before TZH

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/15130 )

Change subject: IMPALA-9280: Fix parsing of timestamp with dash before TZH
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15130/2/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/15130/2/be/src/runtime/datetime-iso-sql-format-parser.cc@332
PS2, Line 332: && cur_pos_move_count > cur_tok_move_count && *(*current_pos) != '+'
> What I understand from your explanation, the number of separator chars in t
Basically separator matching works in a way that if there is at least one separator in a position somewhere in the format then there should be at least one separator in the input string in that section. (By position I don't mean the exact index of that character, rather the relative location to the surrounding tokens).

Those examples are correct.

For more explanation please check the design doc for this feature (feel free to ask if you still have questions):
https://docs.google.com/document/d/1V7k6-lrPGW7_uhqM-FhKl3QsxwCRy69v2KIxPsGjc1k/



-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Jan 2020 18:44:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9280: Fix parsing of timestamp with dash before TZH

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/15130 )

Change subject: IMPALA-9280: Fix parsing of timestamp with dash before TZH
......................................................................


Patch Set 5: Code-Review+2

Thanks for the adjustments!


-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Feb 2020 10:22:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9280: Fix parsing of timestamp with dash before TZH

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15130

to look at the new patch set (#2).

Change subject: IMPALA-9280: Fix parsing of timestamp with dash before TZH
......................................................................

IMPALA-9280: Fix parsing of timestamp with dash before TZH

In ISO SQL datetime format, "-" is both separator character and sign
in TZH token. This change make sure to NOT include the last "-" char
as part of TZH token if the negative sign is the only possible
separator character in the separator substring.

Testing:
* Add test TimestampTest.IsoSqlWithTZ

Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
---
M be/src/runtime/datetime-iso-sql-format-parser.cc
M be/src/runtime/timestamp-test.cc
2 files changed, 42 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/15130/2
-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-9280: Time string with signed TZH token preceded by dash cannot be cast to timestamp

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15130

to look at the new patch set (#3).

Change subject: IMPALA-9280: Time string with signed TZH token preceded by dash cannot be cast to timestamp
......................................................................

IMPALA-9280: Time string with signed TZH token preceded by dash cannot
be cast to timestamp

In ISO SQL datetime format, "-" is both separator character and sign
in TZH token. This change make sure to NOT include the last "-" char
as part of TZH token if the negative sign is the only possible
separator character in the separator substring.

Testing:
* Add test to query_test/test_cast_with_format.py

Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
---
M be/src/runtime/datetime-iso-sql-format-parser.cc
M tests/query_test/test_cast_with_format.py
2 files changed, 9 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/15130/3
-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-9280: Fix parsing of timestamp with dash before TZH

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15130 )

Change subject: IMPALA-9280: Fix parsing of timestamp with dash before TZH
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5543/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Jan 2020 23:31:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9280: Time string with signed TZH token preceded by dash cannot be cast to timestamp

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15130 )

Change subject: IMPALA-9280: Time string with signed TZH token preceded by dash cannot be cast to timestamp
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5539/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Jan 2020 19:43:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9280: Time string with signed TZH token preceded by dash cannot be cast to timestamp

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15130 )

Change subject: IMPALA-9280: Time string with signed TZH token preceded by dash cannot be cast to timestamp
......................................................................


Patch Set 1:

(3 comments)

I don't understand the ISO timestamp edge cases enough to know for sure whether this handles all edge cases. I left a few comments though. I think maybe Gabor should take a look at this too.

http://gerrit.cloudera.org:8080/#/c/15130/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15130/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-9280: Time string with signed TZH token preceded by dash cannot
nit: try to make this fit on a single line. Usually the title also describes the code change rather than the bug. E.g. something like

"fix parsing of timestamp with dash before TZH"


http://gerrit.cloudera.org:8080/#/c/15130/1/be/src/runtime/timestamp-test.cc
File be/src/runtime/timestamp-test.cc:

http://gerrit.cloudera.org:8080/#/c/15130/1/be/src/runtime/timestamp-test.cc@1074
PS1, Line 1074:   // Test handling of '-' separator right before TZH
Maybe add an additional comment like "All of these time strings should successfully parse".


http://gerrit.cloudera.org:8080/#/c/15130/1/be/src/runtime/timestamp-test.cc@1094
PS1, Line 1094:     EXPECT_NE(v.time(), not_a_date_time) << "TC: " << i;
Can we also check expected values for the parsed timestamps, to guard against any other correctness bugs?



-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Jan 2020 21:29:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9280: Fix parsing of timestamp with dash before TZH

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15130 )

Change subject: IMPALA-9280: Fix parsing of timestamp with dash before TZH
......................................................................


Patch Set 2:

(1 comment)

Thanks, I think I'll leave it to a timestamp expert to take a more detailed look at the implementation.

http://gerrit.cloudera.org:8080/#/c/15130/2/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/15130/2/be/src/runtime/datetime-iso-sql-format-parser.cc@297
PS2, Line 297:   int cur_pos_move_count = 0;
I think the algorithm used here could maybe be simplified. I don't want to make suggestions at this point since I don't think I understand the spec well enough.



-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Jan 2020 23:37:00 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9280: Fix parsing of timestamp with dash before TZH

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15130 )

Change subject: IMPALA-9280: Fix parsing of timestamp with dash before TZH
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5553/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Jan 2020 20:46:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9280: Fix parsing of timestamp with dash before TZH

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15130 )

Change subject: IMPALA-9280: Fix parsing of timestamp with dash before TZH
......................................................................


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/15130/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15130/4//COMMIT_MSG@11
PS4, Line 11: uence contains more than one separators then the
            : last '-' of the separator sequence is taken as 
> I have the same comment here as in the parser: This doesn't cover the whole
Commit message clarified in Patch set 5.


http://gerrit.cloudera.org:8080/#/c/15130/4/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/15130/4/be/src/runtime/datetime-iso-sql-format-parser.cc@296
PS4, Line 296: *current_pos;
> nit: no need for the brackets.
Done


http://gerrit.cloudera.org:8080/#/c/15130/4/be/src/runtime/datetime-iso-sql-format-parser.cc@323
PS4, Line 323: // The last '-' of a separator sequence might be taken as a sign for timezone hour.
             :   // If the TZH token itself doesn't start with a '+' sign and the separator sequence
             :   // contains more than one separators then the last '-' of the separator sequence is
> I'd rephrase this comment a bit as it misses to mention that if the TZH sta
Agree. Changed in Patch set 5.



-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 31 Jan 2020 18:10:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9280: Fix parsing of timestamp with dash before TZH

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15130 )

Change subject: IMPALA-9280: Fix parsing of timestamp with dash before TZH
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5572/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 31 Jan 2020 18:53:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9280: Fix parsing of timestamp with dash before TZH

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15130 )

Change subject: IMPALA-9280: Fix parsing of timestamp with dash before TZH
......................................................................


Patch Set 6: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Feb 2020 16:29:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9280: Fix parsing of timestamp with dash before TZH

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15130

to look at the new patch set (#4).

Change subject: IMPALA-9280: Fix parsing of timestamp with dash before TZH
......................................................................

IMPALA-9280: Fix parsing of timestamp with dash before TZH

In ISO SQL datetime format, "-" is both separator character and sign
in TZH token. This change make sure to NOT include the last "-" char
as part of TZH token if the negative sign is the only possible
separator character in the separator substring.

Testing:
* Add test to query_test/test_cast_with_format.py

Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
---
M be/src/runtime/datetime-iso-sql-format-parser.cc
M tests/query_test/test_cast_with_format.py
2 files changed, 9 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/15130/4
-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-9280: Fix parsing of timestamp with dash before TZH

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15130 )

Change subject: IMPALA-9280: Fix parsing of timestamp with dash before TZH
......................................................................


Patch Set 4:

(1 comment)

Patch set 4 submitted.

I only add one test case that is mentioned in JIRA since other corner case that I can think of has been covered by existing tests.

http://gerrit.cloudera.org:8080/#/c/15130/2/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/15130/2/be/src/runtime/datetime-iso-sql-format-parser.cc@332
PS2, Line 332: 
> Basically separator matching works in a way that if there is at least one s
Ack



-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Jan 2020 20:08:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9280: Fix parsing of timestamp with dash before TZH

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/15130 )

Change subject: IMPALA-9280: Fix parsing of timestamp with dash before TZH
......................................................................


Patch Set 2:

(3 comments)

Thanks for taking care of this!

http://gerrit.cloudera.org:8080/#/c/15130/2/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/15130/2/be/src/runtime/datetime-iso-sql-format-parser.cc@332
PS2, Line 332: && cur_pos_move_count > cur_tok_move_count && *(*current_pos) != '+'
Counting the processed separator tokens doesn't seem the best approach for me here. Also I don't see why it is necessary to have more separators in the input than in the format to use the last minus sign as a part of TZH instead of taking it as a separator.

Previously this if condition said "If a TZH token follows the separator sequence and the last separator in the sequence was a minus char then I'll use that minus char as the sign of the TZH".
I'd extend this: "If a TZH token follows the separator sequence and the last separator in the sequence was a minus char and the TZH token doesn't start with a plus char then ..."

Please make sure that in case the minus char is actually taken as sign of TZH instead of a separator then there is still at least one separator remaining in the input sequence to match the separator sequence in the token list.


http://gerrit.cloudera.org:8080/#/c/15130/2/be/src/runtime/timestamp-test.cc
File be/src/runtime/timestamp-test.cc:

http://gerrit.cloudera.org:8080/#/c/15130/2/be/src/runtime/timestamp-test.cc@1071
PS2, Line 1071: TEST(TimestampTest, IsoSqlWithTZ) {
Almost all of these formatting tests are in tests/query_test/test_cast_with_format.py
I think we should keep them together. Please consider moving this coverage.


http://gerrit.cloudera.org:8080/#/c/15130/2/be/src/runtime/timestamp-test.cc@1078
PS2, Line 1078: HH:MI-TZH
As far as I know we don't really support dateless timestamps even though Impala accepts them. Additionally I think we should remove the support of them from Impala at one point (AFAIK JDBC/ODBC connections throw parse error for dateless timestamps currently). So might be safer not to use them even in tests as once we drop the support we have to rewrite less code.



-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Jan 2020 09:00:02 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9280: Fix parsing of timestamp with dash before TZH

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15130

to look at the new patch set (#5).

Change subject: IMPALA-9280: Fix parsing of timestamp with dash before TZH
......................................................................

IMPALA-9280: Fix parsing of timestamp with dash before TZH

In ISO SQL datetime format, '-' is both separator character and sign
in TZH token. If the TZH token itself doesn't start with a '+' sign
and the separator sequence contains more than one separators then the
last '-' of the separator sequence is taken as the sign of TZH.

Testing:
* Add test to query_test/test_cast_with_format.py

Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
---
M be/src/runtime/datetime-iso-sql-format-parser.cc
M tests/query_test/test_cast_with_format.py
2 files changed, 10 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/15130/5
-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-9280: Fix parsing of timestamp with dash before TZH

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15130 )

Change subject: IMPALA-9280: Fix parsing of timestamp with dash before TZH
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/15130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24c636e75fd380f6ebd091bcb38bb60a274f0e00
Gerrit-Change-Number: 15130
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Feb 2020 10:23:04 +0000
Gerrit-HasComments: No