You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tianyi Wang (Code Review)" <ge...@cloudera.org> on 2017/08/31 01:11:51 UTC

[Impala-ASF-CR] IMPALA-5867: Fix bugs parsing 2-digit year

Tianyi Wang has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7910

Change subject: IMPALA-5867: Fix bugs parsing 2-digit year
......................................................................

IMPALA-5867: Fix bugs parsing 2-digit year

This patch fixes several bugs parsing 1 or 2-digit year formats.
Existing code is broken in several ways:
1. With 1 or 2-digit year format and month/day missing, ParseDateTime()
   throws an uncaught exception.
2. If now() is 02/29 in a leap year but (now() - 80 years) isn't,
   DateTimeFormatContext::SetCenturyBreak() throws an uncaught
   exception.
3. If the year parsed is 02/29 in a leap year but it isn't a leap year
   100 years ago, TimestampParser::Parse() will consider the date as
   invalid though it isn't.
This patch fixes above bugs and adds a few test cases in
be/src/runtime/timestamp-test.cc
The behaviors after change is:
1. A date without month/day is considered invalid date. Note this is
   different from Hive.
2. Century break would be set to 03/01 80 years ago.
3. If century break is 19XX and parsed date is 00/02/29, always let it
   be 2000/02/29 instead of an invalid date.

Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
---
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-test.cc
2 files changed, 57 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/7910/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-5867: Fix bugs parsing 2-digit year

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5867: Fix bugs parsing 2-digit year
......................................................................


Patch Set 3: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5867: Fix bugs parsing 2-digit year

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change.

Change subject: IMPALA-5867: Fix bugs parsing 2-digit year
......................................................................


Patch Set 1:

(3 comments)

Some explanation on Hive. Will address other comments.

http://gerrit.cloudera.org:8080/#/c/7910/1/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

Line 68:     century_break_ptime = boost::posix_time::ptime(
> Mention that we set it to March 1 for consistency with Hive?
This is not meant to be consistent with Hive but to not crash. See comment on L352.


Line 352:         if (dt_result.month == 2 && dt_result.day == 29 &&
> I don't fully understand this condition. Couldn't this advance 100 years ev
There isn't 1900-02-29 and boost::gregorian::date would throw an exception if you try to construct it. So without this if block any 00-02-29 will be invalid date. There are many ways to workaround it and I'm not sure which one looks best. By the way, this is not what java.util.Calendar does. In Java if now() is 1980-02-29 18:00:00 the break time will be set to 1900-02-28 18:00:00 and any 02-29 will be parsed as 2000-02-29. But if now() is 1980-03-01 18:00:00, 02-29 17:00:00 will be 2000-02-29 but 02-29 19:00:00 will be 1900-03-01. It seems 1900-02-29 exists in Java, is comparable to a time in 1900-03-01, and will be 2000-02-29 if you add 100 years to it, which is nothing like boost. So trying to be exactly same as Hive would really add some irregular logic here.


Line 370:       if (d->year() < 1400 || d->year() > 9999) {
> I'm not sure if this check is necessary now - there was an off-by-one bug i
Boost still won't throw an exception when you add -1 days to 1400-01-01. But if you call year() on the result it will throw an exception. So basically we don't really need this if block. We just need to call year(). I'm not sure which way is cleaner so please comment.


-- 
To view, visit http://gerrit.cloudera.org:8080/7910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5867: Fix bugs parsing 2-digit year

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5867: Fix bugs parsing 2-digit year
......................................................................


Patch Set 3:

(1 comment)

The fact that we found 3 issues in this code (after IMPALA-3894) makes me nervous. I didn't look in close detail yet, but given 2.10 is looming and I'm worried about all the subtleties of the Timestamp code, I think we should consider reverting 5797f859692ced6f30a790568b29669c663fa65c for 2.10 and getting this fix into master afterwards. I think we need to think about ways to get more test coverage here to make sure we don't have other crashes or incorrect results in other corner cases.

http://gerrit.cloudera.org:8080/#/c/7910/3//COMMIT_MSG
Commit Message:

PS3, Line 22: 1. A date without month/day is considered invalid. Note this is
            :    different from Hive.
why don't we do the same thing as hive?


-- 
To view, visit http://gerrit.cloudera.org:8080/7910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5867: Fix bugs parsing 2-digit year

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5867: Fix bugs parsing 2-digit year
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1208/

-- 
To view, visit http://gerrit.cloudera.org:8080/7910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5867: Fix bugs parsing 2-digit year

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7910

to look at the new patch set (#5).

Change subject: IMPALA-5867: Fix bugs parsing 2-digit year
......................................................................

IMPALA-5867: Fix bugs parsing 2-digit year

This patch fixes several bugs parsing 1 or 2-digit year formats.
Existing code is broken in several ways:
1. With 1 or 2-digit year format and month/day missing, ParseDateTime()
   throws an uncaught exception.
2. If now() is 02/29 in a leap year but (now() - 80 years) isn't,
   DateTimeFormatContext::SetCenturyBreak() throws an uncaught
   exception.
3. If the year parsed is 02/29 in a leap year but it isn't a leap year
   100 years ago, TimestampParser::Parse() will consider the date as
   invalid though it isn't.
This patch fixes above bugs and adds a few test cases in
be/src/runtime/timestamp-test.cc
The behaviors after change is:
1. A date without month or day is considered invalid. This is a
   pre-existing difference from Hive, which defaults missing month/day
   to 01/01.
2. Century break would be set to 02/28 80 years ago.
3. If parsed date is 00/02/29 but 1900/02/29 does not exist, treat
   it as 03/01 when comparing to century break.

Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
---
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
3 files changed, 104 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/7910/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5867: Fix bugs parsing 2-digit year

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change.

Change subject: IMPALA-5867: Fix bugs parsing 2-digit year
......................................................................


Patch Set 1:

(6 comments)

Refactored and changed a few logic to make it a little bit closer to Hive.

http://gerrit.cloudera.org:8080/#/c/7910/1/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

Line 340
> Can you preserve this comment? I think you moved this logic down to l.370?
Done


Line 345:       if (dt_result.realign_year) {
> This function is getting pretty big. Can we factor out the year realignment
Done


Line 346:         // In SimpleDateFormat, the century for 2-digit-year breaks at current_time - 80 years.
> Long line.
Done


PS1, Line 353: boost::gregorian::gregorian_calendar
> This might be clearer with a "using boost::gregorian::gregorian_calendar" u
Done


Line 370:       if (d->year() < 1400 || d->year() > 9999) {
> Boost still won't throw an exception when you add -1 days to 1400-01-01. Bu
Changed to a single call to year(). It looks stupid but I don't know other ways to check if it's in range.


http://gerrit.cloudera.org:8080/#/c/7910/1/be/src/runtime/timestamp-test.cc
File be/src/runtime/timestamp-test.cc:

Line 484:     (TimestampTC("yy-MM-dd", "00-02-29", false, true, false, 2000, 2, 29))
> Mention this is testing the leap-year edge-case.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5867: Fix bugs parsing 2-digit year

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change.

Change subject: IMPALA-5867: Fix bugs parsing 2-digit year
......................................................................


Patch Set 2:

(2 comments)

Added some test cases testing 1400/9999 edge cases.

http://gerrit.cloudera.org:8080/#/c/7910/1/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

Line 370:     }
> Ah ok, I misunderstood the purpose of the old code The old code actually se
Changed back to a if block.


http://gerrit.cloudera.org:8080/#/c/7910/2/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

Line 320: date RealignYear(const DateTimeParseResult& dt_result,
> nit: can you declare this as a static method in the class for consistency.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5867: Fix bugs parsing 2-digit year

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5867: Fix bugs parsing 2-digit year
......................................................................


Patch Set 6: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5867: Fix bugs parsing 2-digit year

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change.

Change subject: IMPALA-5867: Fix bugs parsing 2-digit year
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7910/3//COMMIT_MSG
Commit Message:

PS3, Line 22: 1. A date without month/day is considered invalid. Note this is
            :    different from Hive.
> why don't we do the same thing as hive?
This is one of the many differences between Impala and Hive. I'm not sure how much we want to be the same with Hive and tried not to change the behaviour a lot in this bug fix. See JIRA page for the differences.


-- 
To view, visit http://gerrit.cloudera.org:8080/7910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5867: Fix bugs parsing 2-digit year

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7910

to look at the new patch set (#4).

Change subject: IMPALA-5867: Fix bugs parsing 2-digit year
......................................................................

IMPALA-5867: Fix bugs parsing 2-digit year

This patch fixes several bugs parsing 1 or 2-digit year formats.
Existing code is broken in several ways:
1. With 1 or 2-digit year format and month/day missing, ParseDateTime()
   throws an uncaught exception.
2. If now() is 02/29 in a leap year but (now() - 80 years) isn't,
   DateTimeFormatContext::SetCenturyBreak() throws an uncaught
   exception.
3. If the year parsed is 02/29 in a leap year but it isn't a leap year
   100 years ago, TimestampParser::Parse() will consider the date as
   invalid though it isn't.
This patch fixes above bugs and adds a few test cases in
be/src/runtime/timestamp-test.cc
The behaviors after change is:
1. A date without month or day is considered invalid. This is a
   pre-existing difference from Hive.
2. Century break would be set to 02/28 80 years ago.
3. If parsed date is 00/02/29 but 1900/02/29 does not exist, treat
   it as 03/01 when comparing to century break.

Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
---
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
3 files changed, 104 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/7910/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5867: Fix bugs parsing 2-digit year

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5867: Fix bugs parsing 2-digit year
......................................................................


Patch Set 3:

(1 comment)

One reason that this is a lot safer than the old code is that all of the data manipulation is now enclosed in the try/catch.

http://gerrit.cloudera.org:8080/#/c/7910/3//COMMIT_MSG
Commit Message:

PS3, Line 22: 1. A date without month/day is considered invalid. Note this is
            :    different from Hive.
> This is one of the many differences between Impala and Hive. I'm not sure h
Yeah this is a pre-existing difference from Hive. If you omit month or day I believe Hive assumes you meant "1" but we don't assume that.

It doesn't make sense to change this as part of this patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/7910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5867: Fix bugs parsing 2-digit year

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5867: Fix bugs parsing 2-digit year
......................................................................


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/7910/1/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

Line 340
Can you preserve this comment? I think you moved this logic down to l.370?


Line 68:     century_break_ptime = boost::posix_time::ptime(
Mention that we set it to March 1 for consistency with Hive?


Line 345:       if (dt_result.realign_year) {
This function is getting pretty big. Can we factor out the year realignment logic?


Line 346:         // In SimpleDateFormat, the century for 2-digit-year breaks at current_time - 80 years.
Long line.


Line 352:         if (dt_result.month == 2 && dt_result.day == 29 &&
I don't fully understand this condition. Couldn't this advance 100 years even if the date is after the century break? E.g. if the century break is 1900-01-01 and the date is 00-02-29, then I think this would give 2000-01-01, but I think 1900-02-29 would be correct.

I think the only years where y is not a leap year but y + 100 is a leap year are 1900, 2300, 2700 (since years divisible by 100 are not leap years, except for years divisible by 400) so this may not be that important in practice but it would be good to ensure the logic is correct and have a comment explaining it.


PS1, Line 353: boost::gregorian::gregorian_calendar
This might be clearer with a "using boost::gregorian::gregorian_calendar" up the top of the file.


Line 370:       if (d->year() < 1400 || d->year() > 9999) {
I'm not sure if this check is necessary now - there was an off-by-one bug in boost where an exception wasn't thrown in some edge cases. May be worth seeing if the expr-tests pass without this.


http://gerrit.cloudera.org:8080/#/c/7910/1/be/src/runtime/timestamp-test.cc
File be/src/runtime/timestamp-test.cc:

Line 484:     (TimestampTC("yy-MM-dd", "00-02-29", false, true, false, 2000, 2, 29))
Mention this is testing the leap-year edge-case.


-- 
To view, visit http://gerrit.cloudera.org:8080/7910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5867: Fix bugs parsing 2-digit year

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change.

Change subject: IMPALA-5867: Fix bugs parsing 2-digit year
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7910/3//COMMIT_MSG
Commit Message:

PS3, Line 22: 1. A date without month or day is considered invalid. This is a
            :    pre-existing differe
> It'd be good to make it clear what is a pre-existing difference. This made 
Rephrased this line.


-- 
To view, visit http://gerrit.cloudera.org:8080/7910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5867: Fix bugs parsing 2-digit year

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5867: Fix bugs parsing 2-digit year
......................................................................


IMPALA-5867: Fix bugs parsing 2-digit year

This patch fixes several bugs parsing 1 or 2-digit year formats.
Existing code is broken in several ways:
1. With 1 or 2-digit year format and month/day missing, ParseDateTime()
   throws an uncaught exception.
2. If now() is 02/29 in a leap year but (now() - 80 years) isn't,
   DateTimeFormatContext::SetCenturyBreak() throws an uncaught
   exception.
3. If the year parsed is 02/29 in a leap year but it isn't a leap year
   100 years ago, TimestampParser::Parse() will consider the date as
   invalid though it isn't.
This patch fixes above bugs and adds a few test cases in
be/src/runtime/timestamp-test.cc
The behaviors after change is:
1. A date without month or day is considered invalid. This is a
   pre-existing difference from Hive, which defaults missing month/day
   to 01/01.
2. Century break would be set to 02/28 80 years ago.
3. If parsed date is 00/02/29 but 1900/02/29 does not exist, treat
   it as 03/01 when comparing to century break.

Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
Reviewed-on: http://gerrit.cloudera.org:8080/7910
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
3 files changed, 104 insertions(+), 50 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5867: Fix bugs parsing 2-digit year

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5867: Fix bugs parsing 2-digit year
......................................................................


Patch Set 1:

(2 comments)

Thanks for the explanations. I think I follow the code now. Just had a couple of minor comments.

http://gerrit.cloudera.org:8080/#/c/7910/1/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

Line 370:       if (d->year() < 1400 || d->year() > 9999) {
> Changed to a single call to year(). It looks stupid but I don't know other 
Ah ok, I misunderstood the purpose of the old code The old code actually seems simpler to me, TBH, so I'm ok if you want to change back to that.


http://gerrit.cloudera.org:8080/#/c/7910/2/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

Line 320:   int day_offset = 0;
nit: can you declare this as a static method in the class for consistency.


-- 
To view, visit http://gerrit.cloudera.org:8080/7910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5867: Fix bugs parsing 2-digit year

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5867: Fix bugs parsing 2-digit year
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7910/3//COMMIT_MSG
Commit Message:

PS3, Line 22: 1. A date without month/day is considered invalid. Note this is
            :    different from Hive.
> Yeah this is a pre-existing difference from Hive. If you omit month or day 
It'd be good to make it clear what is a pre-existing difference. This made it sound like this may have changed.


-- 
To view, visit http://gerrit.cloudera.org:8080/7910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5867: Fix bugs parsing 2-digit year

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has uploaded a new patch set (#2).

Change subject: IMPALA-5867: Fix bugs parsing 2-digit year
......................................................................

IMPALA-5867: Fix bugs parsing 2-digit year

This patch fixes several bugs parsing 1 or 2-digit year formats.
Existing code is broken in several ways:
1. With 1 or 2-digit year format and month/day missing, ParseDateTime()
   throws an uncaught exception.
2. If now() is 02/29 in a leap year but (now() - 80 years) isn't,
   DateTimeFormatContext::SetCenturyBreak() throws an uncaught
   exception.
3. If the year parsed is 02/29 in a leap year but it isn't a leap year
   100 years ago, TimestampParser::Parse() will consider the date as
   invalid though it isn't.
This patch fixes above bugs and adds a few test cases in
be/src/runtime/timestamp-test.cc
The behaviors after change is:
1. A date without month/day is considered invalid. Note this is
   different from Hive.
2. Century break would be set to 02/28 80 years ago.
3. If parsed date is 00/02/29 but 1900/02/29 does not exist, treat
   it as 03/01 when comparing to century break.

Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
---
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-test.cc
2 files changed, 82 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/7910/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5867: Fix bugs parsing 2-digit year

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5867: Fix bugs parsing 2-digit year
......................................................................


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5867: Fix bugs parsing 2-digit year

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5867: Fix bugs parsing 2-digit year
......................................................................


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5867: Fix bugs parsing 2-digit year

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has uploaded a new patch set (#3).

Change subject: IMPALA-5867: Fix bugs parsing 2-digit year
......................................................................

IMPALA-5867: Fix bugs parsing 2-digit year

This patch fixes several bugs parsing 1 or 2-digit year formats.
Existing code is broken in several ways:
1. With 1 or 2-digit year format and month/day missing, ParseDateTime()
   throws an uncaught exception.
2. If now() is 02/29 in a leap year but (now() - 80 years) isn't,
   DateTimeFormatContext::SetCenturyBreak() throws an uncaught
   exception.
3. If the year parsed is 02/29 in a leap year but it isn't a leap year
   100 years ago, TimestampParser::Parse() will consider the date as
   invalid though it isn't.
This patch fixes above bugs and adds a few test cases in
be/src/runtime/timestamp-test.cc
The behaviors after change is:
1. A date without month/day is considered invalid. Note this is
   different from Hive.
2. Century break would be set to 02/28 80 years ago.
3. If parsed date is 00/02/29 but 1900/02/29 does not exist, treat
   it as 03/01 when comparing to century break.

Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
---
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
3 files changed, 104 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/7910/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia4f430caea88b6c33f8050a1984ee0ee32ecb0a1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>