You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Matthew Jacobs (Code Review)" <ge...@cloudera.org> on 2017/08/02 00:15:13 UTC

[Impala-ASF-CR] IMPALA-3894: Change the behavior parsing date "YY"

Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3894: Change the behavior parsing date "YY"
......................................................................


Patch Set 9:

(9 comments)

A few code comments that I think you can address now, plus a few behavior questions for Greg - just to double check he's OK with this new behavior. Lets get his input on them before proceeding.

http://gerrit.cloudera.org:8080/#/c/7530/9//COMMIT_MSG
Commit Message:

PS9, Line 11: same as Hive's
How did you determine exactly what Hive's behavior is - did you find a reference in documentation, or code? Can you add a link?


PS9, Line 12: into the interval [current time - 80 years, current time + 20 years).
looks like postgresql's behavior is different. I think they just break at the unix epoch. Lets see if Greg is OK with us following Hive's behavior if that differs from Postgres. We may wanna look at a few more databases to see if there is consensus.

mj=# select to_date('05 Dec 69', 'DD Mon YY');
  to_date   
------------
 2069-12-05
(1 row)

mj=# select to_date('05 Dec 70', 'DD Mon YY');
  to_date   
------------
 1970-12-05
(1 row)


http://gerrit.cloudera.org:8080/#/c/7530/9/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

PS9, Line 26: #
not needed here if it's also included in the header


PS9, Line 448: // Taking tok_len > 3 literally is Hive behavior.
this comment isn't clear to me


PS9, Line 547: unsigned short century_start_year = now_date.year() - 80;
             :     date century_start_date(century_start_year, now_date.month(), now_date.day());
you can use boost operator-(time_duration)

this can also be precomputed as part of the DateTimeContext to avoid some cycles on every row


PS9, Line 557: TimestampValue
just use ptime directly. we're trying to avoid instantiating TimestampValue except for real data (i.e. as part of a tuple)


PS9, Line 558: TimestampValue
same


PS9, Line 558: TimestampValue(century_start_date, dt_ctx.now.time())
it might be worth pre-computing this as part of the DateTimeFormatContext to cut down on the overhead when parsing a lot of rows


http://gerrit.cloudera.org:8080/#/c/7530/9/be/src/runtime/timestamp-test.cc
File be/src/runtime/timestamp-test.cc:

PS9, Line 481:     // Test 1-digit year format with time to show the exact boundary
             :     (TimestampTC("y-MM-dd HH:mm:ss", "37-07-28 16:14:23", false, true, false,
             :                  2037, 7, 28, 16, 14, 23))
             :     (TimestampTC("y-MM-dd HH:mm:ss", "37-07-28 16:14:24", false, true, false,
             :                  1937, 7, 28, 16, 14, 24))
hm, postgres does something different with 'y', let's see what Greg wants us to do. It always adds 2000:

mj=# select to_date('05 Dec 0', 'DD Mon y');
  to_date   
------------
 2000-12-05
(1 row)

mj=# select to_date('05 Dec 9', 'DD Mon y');
  to_date   
------------
 2009-12-05
(1 row)

mj=# select to_date('05 Dec 99', 'DD Mon y');
  to_date   
------------
 2099-12-05
(1 row)

mj=# select to_date('05 Dec 999', 'DD Mon y');
  to_date   
------------
 2999-12-05
(1 row)

mj=# select to_date('05 Dec 9999', 'DD Mon y');
   to_date   
-------------
 11999-12-05
(1 row)


-- 
To view, visit http://gerrit.cloudera.org:8080/7530
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5da761255915dc741f1dcc488fd4ef6ecc385896
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes