You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Alex Behm (Code Review)" <ge...@cloudera.org> on 2017/01/25 06:01:43 UTC

[Impala-ASF-CR] IMPALA-4055: Speed up to date() with custom implementation.

Alex Behm has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5791

Change subject: IMPALA-4055: Speed up to_date() with custom implementation.
......................................................................

IMPALA-4055: Speed up to_date() with custom implementation.

Simple implementation of to_date() that avoids calling
into boost for a speedup of 10x.

Perf:
I generated a synthetic Parquet table with 26437248 rows
and a single timestamp column. I tested the response time
of the following query before and after this change.

set mt_dop=1;
select count(*) from to_date_test
where to_date(ts) = '2017-10-23';

Before: 38.1s
After:   3.3s

Testing: I locally ran expr-test.cc and expr_test.py.

Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
---
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
3 files changed, 29 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/5791/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>

[Impala-ASF-CR] IMPALA-4055: Speed up to date() with custom implementation.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4055: Speed up to_date() with custom implementation.
......................................................................


IMPALA-4055: Speed up to_date() with custom implementation.

Simple implementation of to_date() that avoids calling
into boost for a speedup of 10x.

Perf:
I generated a synthetic Parquet table with 26437248 rows
and a single timestamp column. I tested the response time
of the following query before and after this change.

set mt_dop=1;
select count(*) from to_date_test
where to_date(ts) = '2017-10-23';

Before: 38.1s
After:   3.4s

Testing: I locally ran expr-test.cc and expr_test.py.

Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Reviewed-on: http://gerrit.cloudera.org:8080/5791
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
3 files changed, 25 insertions(+), 12 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>

[Impala-ASF-CR] IMPALA-4055: Speed up to date() with custom implementation.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Hello Jim Apple, Michael Ho,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5791

to look at the new patch set (#5).

Change subject: IMPALA-4055: Speed up to_date() with custom implementation.
......................................................................

IMPALA-4055: Speed up to_date() with custom implementation.

Simple implementation of to_date() that avoids calling
into boost for a speedup of 10x.

Perf:
I generated a synthetic Parquet table with 26437248 rows
and a single timestamp column. I tested the response time
of the following query before and after this change.

set mt_dop=1;
select count(*) from to_date_test
where to_date(ts) = '2017-10-23';

Before: 38.1s
After:   3.4s

Testing: I locally ran expr-test.cc and expr_test.py.

Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
---
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M tests/conftest.py
4 files changed, 25 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/5791/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>

[Impala-ASF-CR] IMPALA-4055: Speed up to date() with custom implementation.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4055: Speed up to_date() with custom implementation.
......................................................................


Patch Set 3:

(2 comments)

I also added an extra safety after discussion this with Tim. We believe that some of our functions may return malformed timestamps, so it's safer to be defensive for now.

http://gerrit.cloudera.org:8080/#/c/5791/3/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

Line 326: static inline void IntToChar(uint8_t* dst, int num, int len) {
> WFM.
Done


Line 338:   DCHECK(ts_value.IsValidDate());
> This ensures ts_value.date().year() is >= 0 and <= 9999, yes?
Yes. Between 1400 and 9999 to be precise.


-- 
To view, visit http://gerrit.cloudera.org:8080/5791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4055: Speed up to date() with custom implementation.

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change.

Change subject: IMPALA-4055: Speed up to_date() with custom implementation.
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5791/3//COMMIT_MSG
Commit Message:

Line 12: Perf:
Worth adding a BE benchmark or a targeted perf test?


http://gerrit.cloudera.org:8080/#/c/5791/3/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

Line 326: static inline void IntToChar(uint8_t* dst, int num, int len) {
snprintf(dst, dst_len, "%0`len`d", num)

`len` should be a constant, but, luckily, it is!


-- 
To view, visit http://gerrit.cloudera.org:8080/5791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4055: Speed up to date() with custom implementation.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4055: Speed up to_date() with custom implementation.
......................................................................


Patch Set 7:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/216/

-- 
To view, visit http://gerrit.cloudera.org:8080/5791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4055: Speed up to date() with custom implementation.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4055: Speed up to_date() with custom implementation.
......................................................................


Patch Set 6: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/213/

-- 
To view, visit http://gerrit.cloudera.org:8080/5791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4055: Speed up to date() with custom implementation.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4055: Speed up to_date() with custom implementation.
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5791/1/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

PS1, Line 344:   // Fill in year.
             :   const int year = ts_value.date().year();
             :   result.ptr[3] = '0' + year % 10;
             :   result.ptr[2] = '0' + (year / 10) % 10;
             :   result.ptr[1] = '0' + (year / 100) % 10;
             :   result.ptr[0] = '0' + (year / 1000) % 10;
             : 
             :   // Fill in month, day, and dashes.
             :   result.ptr[4] = '-';
             :   TwoDigitZeroPaddedItoa(result.ptr + 5, ts_value.date().month());
             :   result.ptr[7] = '-';
             :   TwoDigitZeroPaddedItoa(result.ptr + 8, ts_value.date().day());
             :   return result;
> Not quite as simple as that because we need to deal with zero padding, so w
Didn't read carefully enough, sorry. Your function works as-is including padding.

Updated perf numbers based on this new version which is slightly slower (but better imo).


-- 
To view, visit http://gerrit.cloudera.org:8080/5791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4055: Speed up to date() with custom implementation.

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change.

Change subject: IMPALA-4055: Speed up to_date() with custom implementation.
......................................................................


Patch Set 3: Code-Review+2

Seems trivial enough for me to +2. Please feel free to have another person take a look before committing.

-- 
To view, visit http://gerrit.cloudera.org:8080/5791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4055: Speed up to date() with custom implementation.

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change.

Change subject: IMPALA-4055: Speed up to_date() with custom implementation.
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5791/3/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

Line 326: static inline void IntToChar(uint8_t* dst, int num, int len) {
> Why write \0 at the end? Seems confusing in many respects.
WFM.

    DCHECK_GE(num, 0)

plus add to comment that num >= 0.


Line 338:   DCHECK(ts_value.IsValidDate());
This ensures ts_value.date().year() is >= 0 and <= 9999, yes?


-- 
To view, visit http://gerrit.cloudera.org:8080/5791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4055: Speed up to date() with custom implementation.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has uploaded a new patch set (#2).

Change subject: IMPALA-4055: Speed up to_date() with custom implementation.
......................................................................

IMPALA-4055: Speed up to_date() with custom implementation.

Simple implementation of to_date() that avoids calling
into boost for a speedup of 10x.

Perf:
I generated a synthetic Parquet table with 26437248 rows
and a single timestamp column. I tested the response time
of the following query before and after this change.

set mt_dop=1;
select count(*) from to_date_test
where to_date(ts) = '2017-10-23';

Before: 38.1s
After:   3.3s

Testing: I locally ran expr-test.cc and expr_test.py.

Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
---
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
3 files changed, 29 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/5791/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>

[Impala-ASF-CR] IMPALA-4055: Speed up to date() with custom implementation.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, Jim Apple, Michael Ho,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5791

to look at the new patch set (#7).

Change subject: IMPALA-4055: Speed up to_date() with custom implementation.
......................................................................

IMPALA-4055: Speed up to_date() with custom implementation.

Simple implementation of to_date() that avoids calling
into boost for a speedup of 10x.

Perf:
I generated a synthetic Parquet table with 26437248 rows
and a single timestamp column. I tested the response time
of the following query before and after this change.

set mt_dop=1;
select count(*) from to_date_test
where to_date(ts) = '2017-10-23';

Before: 38.1s
After:   3.4s

Testing: I locally ran expr-test.cc and expr_test.py.

Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
---
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
3 files changed, 25 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/5791/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>

[Impala-ASF-CR] IMPALA-4055: Speed up to date() with custom implementation.

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change.

Change subject: IMPALA-4055: Speed up to_date() with custom implementation.
......................................................................


Patch Set 1:

(1 comment)

Nice change. A suggestion below.

http://gerrit.cloudera.org:8080/#/c/5791/1/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

PS1, Line 344:   // Fill in year.
             :   const int year = ts_value.date().year();
             :   result.ptr[3] = '0' + year % 10;
             :   result.ptr[2] = '0' + (year / 10) % 10;
             :   result.ptr[1] = '0' + (year / 100) % 10;
             :   result.ptr[0] = '0' + (year / 1000) % 10;
             : 
             :   // Fill in month, day, and dashes.
             :   result.ptr[4] = '-';
             :   TwoDigitZeroPaddedItoa(result.ptr + 5, ts_value.date().month());
             :   result.ptr[7] = '-';
             :   TwoDigitZeroPaddedItoa(result.ptr + 8, ts_value.date().day());
             :   return result;
Appears that the int to string conversion can be factored into a function like the following:

static inline void IntToChar(char* dst, int num, int len) {
   DCHECK_GT(len, 0);
   for (int i = len-1; i >= 0; --i) {
       *(dst + i) = '0' + (num % 10);
       num /= 10;
   }
}

IntToChar(result.ptr, year, 4);
IntToChar(result.ptr + 5, month, 2);
IntToChar(result.ptr + 8, date, 2);


-- 
To view, visit http://gerrit.cloudera.org:8080/5791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4055: Speed up to date() with custom implementation.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4055: Speed up to_date() with custom implementation.
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5791/3//COMMIT_MSG
Commit Message:

Line 12: Perf:
> Worth adding a BE benchmark or a targeted perf test?
to_date() is part of expr-benchmark.cc which unfortunately I cannot run


http://gerrit.cloudera.org:8080/#/c/5791/3/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

Line 326: static inline void IntToChar(uint8_t* dst, int num, int len) {
> snprintf(dst, dst_len, "%0`len`d", num)
Why write \0 at the end? Seems confusing in many respects.


-- 
To view, visit http://gerrit.cloudera.org:8080/5791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4055: Speed up to date() with custom implementation.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4055: Speed up to_date() with custom implementation.
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5791/4/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

PS4, Line 327: T
> GE, to match comment above.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4055: Speed up to date() with custom implementation.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4055: Speed up to_date() with custom implementation.
......................................................................


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4055: Speed up to date() with custom implementation.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4055: Speed up to_date() with custom implementation.
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5791/1/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

PS1, Line 344:   // Fill in year.
             :   const int year = ts_value.date().year();
             :   result.ptr[3] = '0' + year % 10;
             :   result.ptr[2] = '0' + (year / 10) % 10;
             :   result.ptr[1] = '0' + (year / 100) % 10;
             :   result.ptr[0] = '0' + (year / 1000) % 10;
             : 
             :   // Fill in month, day, and dashes.
             :   result.ptr[4] = '-';
             :   TwoDigitZeroPaddedItoa(result.ptr + 5, ts_value.date().month());
             :   result.ptr[7] = '-';
             :   TwoDigitZeroPaddedItoa(result.ptr + 8, ts_value.date().day());
             :   return result;
> Appears that the int to string conversion can be factored into a function l
Not quite as simple as that because we need to deal with zero padding, so we need to know the number of decimal digits before that loop.

We will also end up with a few extra mod operations that are not necessary for this function.

I'll give it a try.


-- 
To view, visit http://gerrit.cloudera.org:8080/5791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4055: Speed up to date() with custom implementation.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4055: Speed up to_date() with custom implementation.
......................................................................


Patch Set 7: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4055: Speed up to date() with custom implementation.

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change.

Change subject: IMPALA-4055: Speed up to_date() with custom implementation.
......................................................................


Patch Set 4: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5791/4/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

PS4, Line 327: T
GE, to match comment above.


-- 
To view, visit http://gerrit.cloudera.org:8080/5791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4055: Speed up to date() with custom implementation.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5791

to look at the new patch set (#4).

Change subject: IMPALA-4055: Speed up to_date() with custom implementation.
......................................................................

IMPALA-4055: Speed up to_date() with custom implementation.

Simple implementation of to_date() that avoids calling
into boost for a speedup of 10x.

Perf:
I generated a synthetic Parquet table with 26437248 rows
and a single timestamp column. I tested the response time
of the following query before and after this change.

set mt_dop=1;
select count(*) from to_date_test
where to_date(ts) = '2017-10-23';

Before: 38.1s
After:   3.4s

Testing: I locally ran expr-test.cc and expr_test.py.

Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
---
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M tests/conftest.py
4 files changed, 25 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/5791/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>

[Impala-ASF-CR] IMPALA-4055: Speed up to date() with custom implementation.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4055: Speed up to_date() with custom implementation.
......................................................................


Patch Set 7: Code-Review+2

Revert accidental modification to an unrelated file.

-- 
To view, visit http://gerrit.cloudera.org:8080/5791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4055: Speed up to date() with custom implementation.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has uploaded a new patch set (#3).

Change subject: IMPALA-4055: Speed up to_date() with custom implementation.
......................................................................

IMPALA-4055: Speed up to_date() with custom implementation.

Simple implementation of to_date() that avoids calling
into boost for a speedup of 10x.

Perf:
I generated a synthetic Parquet table with 26437248 rows
and a single timestamp column. I tested the response time
of the following query before and after this change.

set mt_dop=1;
select count(*) from to_date_test
where to_date(ts) = '2017-10-23';

Before: 38.1s
After:   3.4s

Testing: I locally ran expr-test.cc and expr_test.py.

Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
---
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
3 files changed, 21 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/5791/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>

[Impala-ASF-CR] IMPALA-4055: Speed up to date() with custom implementation.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4055: Speed up to_date() with custom implementation.
......................................................................


Patch Set 6:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/213/

-- 
To view, visit http://gerrit.cloudera.org:8080/5791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5713b3e0c27b739aae597a6911cf3b2ddd01f822
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: No