You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Matthew Jacobs (Code Review)" <ge...@cloudera.org> on 2017/08/02 00:15:13 UTC
[Impala-ASF-CR] IMPALA-3894: Change the behavior parsing date "YY"
Matthew Jacobs has posted comments on this change.
Change subject: IMPALA-3894: Change the behavior parsing date "YY"
......................................................................
Patch Set 9:
(9 comments)
A few code comments that I think you can address now, plus a few behavior questions for Greg - just to double check he's OK with this new behavior. Lets get his input on them before proceeding.
http://gerrit.cloudera.org:8080/#/c/7530/9//COMMIT_MSG
Commit Message:
PS9, Line 11: same as Hive's
How did you determine exactly what Hive's behavior is - did you find a reference in documentation, or code? Can you add a link?
PS9, Line 12: into the interval [current time - 80 years, current time + 20 years).
looks like postgresql's behavior is different. I think they just break at the unix epoch. Lets see if Greg is OK with us following Hive's behavior if that differs from Postgres. We may wanna look at a few more databases to see if there is consensus.
mj=# select to_date('05 Dec 69', 'DD Mon YY');
to_date
------------
2069-12-05
(1 row)
mj=# select to_date('05 Dec 70', 'DD Mon YY');
to_date
------------
1970-12-05
(1 row)
http://gerrit.cloudera.org:8080/#/c/7530/9/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:
PS9, Line 26: #
not needed here if it's also included in the header
PS9, Line 448: // Taking tok_len > 3 literally is Hive behavior.
this comment isn't clear to me
PS9, Line 547: unsigned short century_start_year = now_date.year() - 80;
: date century_start_date(century_start_year, now_date.month(), now_date.day());
you can use boost operator-(time_duration)
this can also be precomputed as part of the DateTimeContext to avoid some cycles on every row
PS9, Line 557: TimestampValue
just use ptime directly. we're trying to avoid instantiating TimestampValue except for real data (i.e. as part of a tuple)
PS9, Line 558: TimestampValue
same
PS9, Line 558: TimestampValue(century_start_date, dt_ctx.now.time())
it might be worth pre-computing this as part of the DateTimeFormatContext to cut down on the overhead when parsing a lot of rows
http://gerrit.cloudera.org:8080/#/c/7530/9/be/src/runtime/timestamp-test.cc
File be/src/runtime/timestamp-test.cc:
PS9, Line 481: // Test 1-digit year format with time to show the exact boundary
: (TimestampTC("y-MM-dd HH:mm:ss", "37-07-28 16:14:23", false, true, false,
: 2037, 7, 28, 16, 14, 23))
: (TimestampTC("y-MM-dd HH:mm:ss", "37-07-28 16:14:24", false, true, false,
: 1937, 7, 28, 16, 14, 24))
hm, postgres does something different with 'y', let's see what Greg wants us to do. It always adds 2000:
mj=# select to_date('05 Dec 0', 'DD Mon y');
to_date
------------
2000-12-05
(1 row)
mj=# select to_date('05 Dec 9', 'DD Mon y');
to_date
------------
2009-12-05
(1 row)
mj=# select to_date('05 Dec 99', 'DD Mon y');
to_date
------------
2099-12-05
(1 row)
mj=# select to_date('05 Dec 999', 'DD Mon y');
to_date
------------
2999-12-05
(1 row)
mj=# select to_date('05 Dec 9999', 'DD Mon y');
to_date
-------------
11999-12-05
(1 row)
--
To view, visit http://gerrit.cloudera.org:8080/7530
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I5da761255915dc741f1dcc488fd4ef6ecc385896
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes